Revealing informative metabolites with random variable combination based on model population analysis for metabolomics data

文献类型: 外文期刊

第一作者: Yun, Yong-Huan

作者: Yun, Yong-Huan;Zhang, Jiachao;Chen, Haiming;Chen, Wenxue;Zhong, Qiuping;Zhang, Weimin;Chen, Weijun;Yun, Yong-Huan

作者机构:

关键词: Metabolomics; Variable selection; Biomarker discovery; Informative metabolites; Variable combination; Model population analysis

期刊名称:CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS ( 影响因子:3.491; 五年影响因子:3.839 )

ISSN: 0169-7439

年卷期: 2020 年 197 卷

页码:

收录情况: SCI

摘要: The discovery of biomarker is a critical and essential step in metabolomics research. With the increasing complexity of metabolomics data generated by high resolution instruments, it is always an urgent need for chemometricians or statisticians to develop a method to efficiently reveal informative metabolites (variables). Based on the framework of model population analysis, a strategy coupled with partial least squares discriminant analysis (PLS-DA), called revealing informative metabolites iteratively (RIMI), was proposed in this study. For the sake of considering the synergetic effect of multiple variables, a vast population of random variable combinations are generated. It is worth pointing out that only the variable combinations with higher model accuracy are used to make paired models in order to statistically assess the importance of each variable in accordance with its beneficial contribution to classification model performance. Four types of variables which include strongly informative, weakly informative, noise and interfering variables, are then identified based on the difference and its significance of the area under the receiver operating characteristic curve (AUROC) values of exclusion and inclusion of each variable. With this definition, unbeneficial variables, including noise and interfering variables, were eliminated iteratively in a mild way. Strongly and weakly informative variables regarded as beneficial variables, are retained, and their P values of t-test are used to reveal the best variable subset. Due to the advantage in exploring useful information from a vast number of variable combinations with good performance, when applied to two metabolomics datasets, RIMI has greatly improved the accuracy value of classification model compared to other methods as the results show. It is indicated that RIMI has efficiently revealed informative metabolites and is regarded as a good alternative for biomarker discovery.

分类号:

  • 相关文献
作者其他论文 更多>>