农科机构知识库联盟

Improving large language models for miRNA information extraction via prompt engineering

文献类型：外文期刊

第一作者： Wu, Rongrong

作者： Wu, Rongrong;Zong, Hui;Wu, Erman;Li, Jiakun;Zhou, Yi;Zhang, Chi;Zhang, Yingbo;Wang, Jiao;Tang, Tong;Shen, Bairong;Wu, Rongrong;Zong, Hui;Wu, Erman;Li, Jiakun;Zhou, Yi;Zhang, Chi;Zhang, Yingbo;Wang, Jiao;Tang, Tong;Shen, Bairong;Wu, Rongrong;Wu, Erman;Zhang, Chi;Zhang, Yingbo;Shen, Bairong

作者机构：

关键词： MicroRNA; Cancer; Large language models; Information extraction; Datasets; Prompt engineering

期刊名称：COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE （影响因子：4.8；五年影响因子：5.4 ）

ISSN： 0169-2607

年卷期： 2025 年 271 卷

页码：

收录情况： SCI

摘要： Objective: Large language models (LLMs) demonstrate significant potential in biomedical knowledge discovery, yet their performance in extracting fine-grained biological information, such as miRNA, remains insufficiently explored. Accurate extraction of miRNA-related information is essential for understanding disease mechanisms and identifying biomarkers. This study aims to comprehensively evaluate the capabilities of LLMs in miRNA information extraction through diverse prompt learning strategies. Methods: Three high-quality miRNA information extraction datasets were constructed to support the benchmarking and training of generative LLMs, specifically Re-Tex, Re-miR and miR-Cancer. These datasets encompass three types of entities: miRNAs, genes, and diseases, along with their relationships. The accuracy and reliability of three LLMs, including GPT-4o, Gemini, and Claude, were evaluated and compared with traditional models. Different prompt engineering strategies were implemented to enhance the LLMs' performance, including baseline prompts, 5-shot Chain of Thought prompts, and generated knowledge prompts. Results: The combination of optimized prompt strategies significantly improved overall entity extraction performance across both trained and untrained datasets. Generated knowledge prompting achieved the highest performance, with maximum F1 scores of 76.6 % for entity extraction and 54.8 % for relationship extraction. Comparative analysis indicated GPT-4o exhibited superior performance to Gemini, while Claude showed the lowest performance levels. Extraction accuracy varied considerably across entity types, with miRNA recognition achieving the highest performance and gene/protein identification demonstrating the lowest accuracy levels. Furthermore, binary relationship extraction accuracy was significantly lower than entity extraction performance. The three evaluated LLMs showed similarly limited capability in relationship extraction tasks, with no statistically significant differences observed between models. Finally, comparison with conventional computational methods revealed LLMs have not yet exceeded traditional methods in this specialized domain. Conclusion: This study established high-quality miRNA datasets to support information extraction and knowledge discovery. The overall performance of LLMs in this study proved limited, and challenges remain in processing miRNA-related information extraction. However, optimized prompt combinations can substantially improve performance. Future work should focus on further refinement of LLMs to accelerate the discovery and application of potential diagnostic and therapeutic targets.

分类号：

相关文献

作者其他论文更多>>

Maternal Supplementation of Collagen Peptide Chelated Trace Elements Enhances Skeletal Muscle Development in Chicks

作者：Wang, Jiao;Lv, Zengpeng;Huang, Zhenwu;Li, Simeng

关键词：Collagen peptide chelated trace elements; Maternal nutrition; Breeder hens; Muscle development; Offspring
Efficient conversion of insoluble dietary fiber to soluble dietary fiber by Bacillus subtilis BSNK-5 fermentation of okara and improvement of their structural and functional properties

作者：Meng, Weimin;Hu, Miao;Zhang, Pengfei;Wang, Jiao;Yuan, Zifan;Wang, Fengzhong;Li, Shuying;Wang, Fengzhong;Wang, Fengzhong

关键词：Okara; Soluble dietary fiber; Insoluble dietary fiber; Bacillus subtilis; Processing characteristics; Functional characteristics
Aligning Large Language Models with Humans: A Comprehensive Survey of ChatGPT's Aptitude in Pharmacology

作者：Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Lu, Junyu;Wu, Cong;He, Mengqiao;Liu, Xingyun;Wu, Rongrong;Zhao, Jing;Zhan, Chaoying;Singla, Rajeev K.;Shen, Bairong;Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Lu, Junyu;Wu, Cong;He, Mengqiao;Liu, Xingyun;Wu, Rongrong;Zhao, Jing;Zhan, Chaoying;Singla, Rajeev K.;Shen, Bairong;Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Liu, Xingyun;Du, Dan;Zhan, Zhajun;Singla, Rajeev K.

关键词：
Special expression of alanine-aminotransferase1 (OsAlaAT1) improves nitrogen utilization in wheat

作者：Jiao, Bo;Wang, Jiao;Dong, Fushuang;Yang, Fan;Liu, Yongwei;Sun, Lei;Chai, Jianfang;Zhou, Shuo

关键词：
Global Potential Geographic Distribution of Anthonomus eugenii Under Climate Change: A Comprehensive Analysis Based on an Ensemble Modeling Approach

作者：Wang, Peilin;Wei, Dandan;Jiang, Hongbo;Wang, Peilin;Yang, Ming;Zhao, Haoxiang;Zhang, Guifen;Xian, Xiaoqing;Zhang, Yibo;Zhang, Chi;Huang, Hongkun

关键词：Biological invasion; Invasive alien insects; Species potential distribution; biomod2
Transformation effects of Bacillus subtilis BSNK-5 on okara: Insights into its component transformation, structural characteristics, and functional properties

作者：Meng, Weimin;Hu, Miao;Gao, Yaxin;Zhang, Pengfei;Wang, Jiao;Yuan, Zifan;Li, Shuying;Wang, Fengzhong;Wang, Fengzhong;Wang, Fengzhong

关键词：Okara; Bacillus subtilis BSNK-5; Structural characteristics; Processing characteristics; Functional characteristics
Identification of loci and candidate genes related to nodulation in soybean

作者：Fan, Renzhong;Wang, Jiao;Yu, Deyue;Cheng, Hao;Chao, Shengqian

关键词：Soybean; Root nodule; Genome-wide association studies; PCA; Transcriptome analysis

Improving large language models for miRNA information extraction via prompt engineering

作者其他论文 更多>>

Maternal Supplementation of Collagen Peptide Chelated Trace Elements Enhances Skeletal Muscle Development in Chicks

Efficient conversion of insoluble dietary fiber to soluble dietary fiber by Bacillus subtilis BSNK-5 fermentation of okara and improvement of their structural and functional properties

Aligning Large Language Models with Humans: A Comprehensive Survey of ChatGPT's Aptitude in Pharmacology

Special expression of alanine-aminotransferase1 (OsAlaAT1) improves nitrogen utilization in wheat

Global Potential Geographic Distribution of Anthonomus eugenii Under Climate Change: A Comprehensive Analysis Based on an Ensemble Modeling Approach

Transformation effects of Bacillus subtilis BSNK-5 on okara: Insights into its component transformation, structural characteristics, and functional properties

Identification of loci and candidate genes related to nodulation in soybean

作者其他论文更多>>