PlantLncBoost: key features for plant lncRNA identification and significant improvement in accuracy and generalization

文献类型: 外文期刊

第一作者: Tian, Xue-Chan

作者: Tian, Xue-Chan;Jiang, Li-Bo;Tian, Xue-Chan;Nie, Shuai;Mao, Jian-Feng;Nie, Shuai;Domingues, Douglas;Rossi Paschoal, Alexandre;Rossi Paschoal, Alexandre;Mao, Jian-Feng

作者机构:

关键词: feature selection; Fourier transform; gradient boosting algorithms; long noncoding RNAs (lncRNAs); model selection; ORF coverage

期刊名称:NEW PHYTOLOGIST ( 影响因子:8.1; 五年影响因子:10.3 )

ISSN: 0028-646X

年卷期: 2025 年 247 卷 3 期

页码:

收录情况: SCI

摘要: Long noncoding RNAs (lncRNAs) are critical regulators of numerous biological processes in plants. Nevertheless, their identification is challenging due to the low sequence conservation across various species. Existing computational methods for lncRNA identification often face difficulties in generalizing across diverse plant species, highlighting the need for more robust and versatile identification models. Here, we present PlantLncBoost, a novel computational tool designed to improve the generalization in plant lncRNA identification. By integrating advanced gradient boosting algorithms with comprehensive feature selection, our approach achieves both high accuracy and generalizability. We conducted an extensive analysis of 1662 features and identified three key features - ORF coverage, complex Fourier average, and atomic Fourier amplitude - that effectively distinguish lncRNAs from mRNAs. We assessed the performance of PlantLncBoost using comprehensive datasets from 20 plant species. The model exhibited exceptional performance, with an accuracy of 96.63%, a sensitivity of 98.42%, and a specificity of 94.93%, significantly outperforming existing tools. Further analysis revealed that the features we selected effectively capture the differences between lncRNAs and mRNAs across a variety of plant species. PlantLncBoost represents a significant advancement in plant lncRNA identification. It is freely accessible on GitHub () and has been integrated into a comprehensive analysis pipeline, Plant-LncRNA-pipeline v.2 ().

分类号:

  • 相关文献
作者其他论文 更多>>