文献类型: 外文期刊
作者: Wu, Rongrong 1 ; Zong, Hui 1 ; Wu, Erman 1 ; Li, Jiakun 1 ; Zhou, Yi 1 ; Zhang, Chi 1 ; Zhang, Yingbo 1 ; Wang, Jiao 1 ; Tang, Tong 1 ; Shen, Bairong 1 ;
作者机构: 1.Sichuan Univ, West China Hosp, Dept Urol, Chengdu, Peoples R China
2.Sichuan Univ, West China Hosp, Inst Syst Genet, Frontiers Sci Ctr Dis Related Mol Network, Chengdu, Peoples R China
3.Soochow Univ, Affiliated Hosp 1, Operat Management Dept, Suzhou, Peoples R China
4.Xinjiang Med Univ, Affiliated Hosp 1, Dept Neurosurg, Urumqi, Peoples R China
5.Sichuan Univ, West China Hosp, Dept Crit Care Med, Joint Lab Artificial Intelligence Crit Care Med, Chengdu, Peoples R China
6.Chinese Acad Trop Agr Sci, Trop Crops Genet Resources Inst, Haikou, Peoples R China
7.Sichuan Univ, West China Tianfu Hosp, Chengdu, Sichuan, Peoples R China
关键词: MicroRNA; Cancer; Large language models; Information extraction; Datasets; Prompt engineering
期刊名称:COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE ( 影响因子:4.8; 五年影响因子:5.4 )
ISSN: 0169-2607
年卷期: 2025 年 271 卷
页码:
收录情况: SCI
摘要: Objective: Large language models (LLMs) demonstrate significant potential in biomedical knowledge discovery, yet their performance in extracting fine-grained biological information, such as miRNA, remains insufficiently explored. Accurate extraction of miRNA-related information is essential for understanding disease mechanisms and identifying biomarkers. This study aims to comprehensively evaluate the capabilities of LLMs in miRNA information extraction through diverse prompt learning strategies. Methods: Three high-quality miRNA information extraction datasets were constructed to support the benchmarking and training of generative LLMs, specifically Re-Tex, Re-miR and miR-Cancer. These datasets encompass three types of entities: miRNAs, genes, and diseases, along with their relationships. The accuracy and reliability of three LLMs, including GPT-4o, Gemini, and Claude, were evaluated and compared with traditional models. Different prompt engineering strategies were implemented to enhance the LLMs' performance, including baseline prompts, 5-shot Chain of Thought prompts, and generated knowledge prompts. Results: The combination of optimized prompt strategies significantly improved overall entity extraction performance across both trained and untrained datasets. Generated knowledge prompting achieved the highest performance, with maximum F1 scores of 76.6 % for entity extraction and 54.8 % for relationship extraction. Comparative analysis indicated GPT-4o exhibited superior performance to Gemini, while Claude showed the lowest performance levels. Extraction accuracy varied considerably across entity types, with miRNA recognition achieving the highest performance and gene/protein identification demonstrating the lowest accuracy levels. Furthermore, binary relationship extraction accuracy was significantly lower than entity extraction performance. The three evaluated LLMs showed similarly limited capability in relationship extraction tasks, with no statistically significant differences observed between models. Finally, comparison with conventional computational methods revealed LLMs have not yet exceeded traditional methods in this specialized domain. Conclusion: This study established high-quality miRNA datasets to support information extraction and knowledge discovery. The overall performance of LLMs in this study proved limited, and challenges remain in processing miRNA-related information extraction. However, optimized prompt combinations can substantially improve performance. Future work should focus on further refinement of LLMs to accelerate the discovery and application of potential diagnostic and therapeutic targets.
- 相关文献
作者其他论文 更多>>
-
Aligning Large Language Models with Humans: A Comprehensive Survey of ChatGPT's Aptitude in Pharmacology
作者:Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Lu, Junyu;Wu, Cong;He, Mengqiao;Liu, Xingyun;Wu, Rongrong;Zhao, Jing;Zhan, Chaoying;Singla, Rajeev K.;Shen, Bairong;Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Lu, Junyu;Wu, Cong;He, Mengqiao;Liu, Xingyun;Wu, Rongrong;Zhao, Jing;Zhan, Chaoying;Singla, Rajeev K.;Shen, Bairong;Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Liu, Xingyun;Du, Dan;Zhan, Zhajun;Singla, Rajeev K.
关键词:
-
Blue light-induced MiBBX24 and MiBBX27 simultaneously promote peel anthocyanin and flesh carotenoid biosynthesis in mango
作者:Pan, Chen;Liao, Yifei;Shi, Baojing;Zhang, Manman;Zhou, Yi;Wu, Jiahao;Bai, Songling;Teng, Yuanwen;Ni, Junbei;Pan, Chen;Liao, Yifei;Shi, Baojing;Zhang, Manman;Zhou, Yi;Wu, Jiahao;Teng, Yuanwen;Ni, Junbei;Pan, Chen;Liao, Yifei;Shi, Baojing;Zhang, Manman;Zhou, Yi;Wu, Jiahao;Bai, Songling;Teng, Yuanwen;Ni, Junbei;Pan, Chen;Liao, Yifei;Shi, Baojing;Zhang, Manman;Zhou, Yi;Wu, Jiahao;Bai, Songling;Teng, Yuanwen;Ni, Junbei;Wu, Hongxia;Qian, Minjie
关键词:Blue light; Mango; BBX proteins; Pleiotropic physiological functions; Anthocyanin; Carotenoid
-
Expertise or Hallucination? A Comprehensive Evaluation of ChatGPT's Aptitude in Clinical Genetics
作者:Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Zhan, Chaoying;He, Mengqiao;Liu, Xingyun;Wu, Rongrong;Zhao, Jing;Wu, Cong;Shen, Bairong;Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Zhan, Chaoying;He, Mengqiao;Liu, Xingyun;Wu, Rongrong;Zhao, Jing;Wu, Cong;Shen, Bairong;Zhang, Yingbo;Ren, Shumin;Wang, Jiao;Liu, Xingyun;Fan, Chuanzhu
关键词:Genetics; Diseases; Chatbots; Market research; Biological cells; Databases; Testing; Accuracy; Medical diagnostic imaging; Big Data; Clinical genetics; ChatGPT (chat generative pre-trained transformer); genetic diseases; pathogenic gene; chromosomal abnormalities; knowledge hallucination
-
The comprehensive clinical benefits of digital phenotyping: from broad adoption to full impact
作者:Zhang, Yingbo;Wang, Jiao;Zong, Hui;Liu, Xingyun;Wu, Rongrong;Ren, Shumin;Shen, Bairong;Zhang, Yingbo;Wang, Jiao;Zong, Hui;Singla, Rajeev K.;Ullah, Amin;Liu, Xingyun;Wu, Rongrong;Ren, Shumin;Shen, Bairong;Zhang, Yingbo;Wang, Jiao;Liu, Xingyun;Singla, Rajeev K.;Ullah, Amin;Singla, Rajeev K.;Shen, Bairong
关键词:
-
The versatile plant probiotic bacterium Bacillus velezensis SF305 reduces red root rot disease severity in the rubber tree by degrading the mycelia of Ganoderma pseudoferreum
作者:Tu, Min;Cai, Haibin;Zhu, Zhongfeng;Zhang, Yikun;Yan, Yichao;Yin, Ke;Sha, Zhimin;Chen, Gongyou;Zou, Lifang;Zhao, Xinyang;Zhou, Yi;Tu, Min;Cai, Haibin;Zhu, Zhongfeng;Zhang, Yikun;Yan, Yichao;Yin, Ke;Chen, Gongyou;Zou, Lifang
关键词:Bacillus velezensis; Ganoderma pseudoferreum; red root rot disease; rubber tree; biocontrol; comparative genomics
-
Individual and combined effects of earthworms and Sphingobacterium sp. on soil organic C, N forms and enzyme activities in non-contaminated and Cd-contaminated soil
作者:Liu, Qing;Chen, Siyi;Chen, Yiqing;Zhong, Hesen;Zhang, Menghao;Tibihenda, Cevin;Dai, Jun;Zhang, Chi;Jia, Li;Motelica-Heino, Mikael;Liu, Qing;Liu, Kexue;Lavelle, Patrick;Zhang, Chi
关键词:Earthworm; Sphingobacterium sp.; Cd-contaminated soil; Carbon and nitrogen forms; Enzyme activities
-
Isolation, Identification, and Biocontrol Mechanisms of Endophytic Burkholderia arboris DHR18 from Rubber Tree against Red Root Rot Disease
作者:Meng, Xiangjia;Luo, Youhong;Zhao, Xinyang;Fu, Yongwei;Cai, Haibin;Tu, Min;Meng, Xiangjia;Luo, Youhong;Zhao, Xinyang;Zhou, Yi;Zou, Lifang;Tu, Min
关键词:Burkholderia arboris; rubber red root diseases; Ganoderma pseudoferreum; biological control



