您好,欢迎访问广东省农业科学院 机构知识库!

Plant-LncPipe: a computational pipeline providing significant improvement in plant lncRNA identification

文献类型: 外文期刊

作者: Tian, Xue-Chan 1 ; Chen, Zhao-Yang 1 ; Nie, Shuai 2 ; Shi, Tian-Le 1 ; Yan, Xue-Mei 1 ; Bao, Yu-Tao 1 ; Li, Zhi-Chao 1 ; Ma, Hai-Yao 1 ; Jia, Kai-Hua 5 ; Zhao, Wei 6 ; Mao, Jian-Feng 1 ;

作者机构: 1.Beijing Forestry Univ, Coll Biol Sci & Technol, Key Lab Genet & Breeding Forest Trees & Ornamental, Minist Educ,Natl Engn Lab Tree Breeding,State Key, Beijing 100083, Peoples R China

2.Guangdong Acad Agr Sci, Rice Res Inst, Guangzhou 510640, Peoples R China

3.Minist Agr & Rural Affairs, Key Lab Genet & Breeding High Qual Rice Southern C, Guangzhou 510640, Peoples R China

4.Guangdong Key Lab New Technol Rice Breeding, Guangzhou 510640, Peoples R China

5.Shandong Acad Agr Sci, Inst Crop Germplasm Resources, Key Lab Crop Genet Improvement & Ecol & Physiol, Jinan 250100, Peoples R China

6.Umea Univ, Umea Plant Sci Ctr UPSC, Dept Plant Physiol, S-90187 Umea, Sweden

期刊名称:HORTICULTURE RESEARCH ( 影响因子:8.7; 五年影响因子:9.0 )

ISSN: 2662-6810

年卷期: 2024 年 11 卷 4 期

页码:

收录情况: SCI

摘要: Long non-coding RNAs (lncRNAs) play essential roles in various biological processes, such as chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. Despite their critical functions in regulating plant growth, root development, and seed dormancy, the identification of plant lncRNAs remains a challenge due to the scarcity of specific and extensively tested identification methods. Most mainstream machine learning-based methods used for plant lncRNA identification were initially developed using human or other animal datasets, and their accuracy and effectiveness in predicting plant lncRNAs have not been fully evaluated or exploited. To overcome this limitation, we retrained several models, including CPAT, PLEK, and LncFinder, using plant datasets and compared their performance with mainstream lncRNA prediction tools such as CPC2, CNCI, RNAplonc, and LncADeep. Retraining these models significantly improved their performance, and two of the retrained models, LncFinder-plant and CPAT-plant, alongside their ensemble, emerged as the most suitable tools for plant lncRNA identification. This underscores the importance of model retraining in tackling the challenges associated with plant lncRNA identification. Finally, we developed a pipeline (Plant-LncPipe) that incorporates an ensemble of the two best-performing models and covers the entire data analysis process, including reads mapping, transcript assembly, lncRNA identification, classification, and origin, for the efficient identification of lncRNAs in plants. The pipeline, Plant-LncPipe, is available at: https://github.com/xuechantian/Plant-LncRNA-pipline.

  • 相关文献
作者其他论文 更多>>