Machine learning assists prediction of genes responsible for plant specialized metabolite biosynthesis by integrating multi-omics data

文献类型: 外文期刊

第一作者: Bai, Wenhui

作者: Bai, Wenhui;Han, Xiaohong;Bai, Wenhui;Li, Cheng;Li, Wei;Wang, Li;Wang, Peipei;Wang, Hai

作者机构:

关键词: Machine learning; Plant specialized metabolites; Genomics; Proteomics; AutoGluon-Tabular

期刊名称:BMC GENOMICS ( 影响因子:4.4; 五年影响因子:4.7 )

ISSN: 1471-2164

年卷期: 2024 年 25 卷 1 期

页码:

收录情况: SCI

摘要: Background Plant specialized (or secondary) metabolites (PSM), also known as phytochemicals, natural products, or plant constituents, play essential roles in interactions between plants and environment. Although many research efforts have focused on discovering novel metabolites and their biosynthetic genes, the resolution of metabolic pathways and identified biosynthetic genes was limited by rudimentary analysis approaches and enormous number of candidate genes.Results Here we integrated state-of-the-art automated machine learning (ML) frame AutoGluon-Tabular and multi-omics data from Arabidopsis to predict genes encoding enzymes involved in biosynthesis of plant specialized metabolite (PSM), focusing on the three main PSM categories: terpenoids, alkaloids, and phenolics. We found that the related features of genomics and proteomics were the top two crucial categories of features contributing to the model performance. Using only these key features, we built a new model in Arabidopsis, which performed better than models built with more features including those related with transcriptomics and epigenomics. Finally, the built models were validated in maize and tomato, and models tested for maize and trained with data from two other species exhibited either equivalent or superior performance to intraspecies predictions.Conclusions Our external validation results in grape and poppy on the one hand implied the applicability of our model to the other species, and on the other hand showed enormous potential to improve the prediction of enzymes synthesizing PSM with the inclusion of valid data from a wider range of species.

分类号:

  • 相关文献
作者其他论文 更多>>