Discrimination of Gentiana and Its Related Species Using IR Spectroscopy Combined with Feature Selection and Stacked Generalization
文献类型: 外文期刊
第一作者: Shen, Tao
作者: Shen, Tao;Yu, Hong;Shen, Tao;Yu, Hong;Shen, Tao;Wang, Yuan-Zhong
作者机构:
关键词: NIR; FT-MIR; species identification; Gentiana; chemometrics; feature selection; stacked generalization
期刊名称:MOLECULES ( 影响因子:4.411; 五年影响因子:4.587 )
ISSN:
年卷期: 2020 年 25 卷 6 期
页码:
收录情况: SCI
摘要: Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000-4000 cm(-1)) and Fourier transform mid-infrared (MIR: 4000-600 cm(-1)) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen's kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.
分类号:
- 相关文献
作者其他论文 更多>>
-
Based on metabolomics and fourier transforms near infrared spectroscopy characterization of Lanxangia tsaoko chemical profile differences among fruit types and development of rapid identification and nutrient prediction models
作者:Fu, Deng-Ke;Yang, Wei-Ze;Yang, Mei-Quan;Yang, Tian-Mei;Wang, Yuan-Zhong;Zhang, Jin-Yu;Fu, Deng-Ke
关键词:Lanxangia tsao-ko; Phenotype; Metabolic differences; 2DCOS; ResNet; PLSR
-
FT-NIR Spectra of Different Dimensions Combined with Machine Learning and Image Recognition for Origin Identification: An Example of Panax notoginseng
作者:Zuo, Zhi-Tian;Yao, Zeng-Yu;Zuo, Zhi-Tian;Wang, Yuan-Zhong
关键词:
-
Application of ATR-FTIR Spectrum Combined With Ensemble Learning and Deep Learning for Identification of Amomum tsao-ko at Different Drying Temperatures
作者:He, Gang;Yang, Shao-bing;Wang, Yuan-zhong;He, Gang
关键词:Amomum tsao-ko; deep learning; drying temperatures; ensemble learning; machine learning
-
The Effects of Three Bean Shell Biochars Under Different Pyrolysis Temperatures on the Adsorption of Cd and Pb in Aqueous Solutions
作者:Shen, Tao;Xia, Hongyu;Zhang, Heyi;Guang, Song;Hu, Wenwen;Zhao, Wenrui;Zhao, Kuan;Xiao, Xin;Zhang, Shiwen;Xu, Aiai
关键词:bean shell biochar; representation; heavy metals; adsorption
-
Development and Validation of Multi-Locus GWAS-Based KASP Markers for Maize Ustilago maydis Resistance
作者:Shen, Tao;Gao, Huawei;Zhu, Liying;Zhao, Yongfeng;Guo, Jinjie;Wang, Chao;Zheng, Yunxiao;Song, Weibin;Hou, Peng;Song, Wei
关键词:corn smut; disease index; candidate genes; haplotype combinations; marker-assisted selection
-
Natural Variation of a Specific NLR Gene RGA4L Confers Strong Chilling Tolerance in Rice
作者:Gan, Ping;Wang, Yongliang;Wei, Hanxing;Lu, Siyuan;Sun, Jinliang;Luo, Xianglan;Jia, Peilong;Cen, Weijian;Li, Rongbai;Luo, Jijing;Gan, Ping;Gan, Ping;Meng, Xiangbing;Li, Jiayang;Yu, Hong;Li, Jiayang
关键词:chilling tolerance; natural variation; quantitative trait loci; R protein; regulatory mechanism; rice
-
Spatial and temporal distribution characteristics of Paris polyphylla var. yunnanensis and the prediction of steroidal saponins content
作者:Zhong, Chen;Li, Li;Zhong, Chen;Wang, Yuan-Zhong
关键词:Paris polyphylla var. yunnanensis; Habitat suitability; FT-IR spectroscopy; Chemometrics; Steroidal saponins