您好,欢迎访问浙江省农业科学院 机构知识库!

Fd-CasBGRel: A Joint Entity-Relationship Extraction Model for Aquatic Disease Domains

文献类型: 外文期刊

作者: Ye, Hongbao 1 ; Lv, Lijian 1 ; Zhou, Chengquan 2 ; Sun, Dawei 2 ;

作者机构: 1.Zhejiang A&F Univ, Coll Math & Comp Sci, 666 Wusu St, Hangzhou 311300, Peoples R China

2.Zhejiang Acad Agr Sci, Agr Equipment Res Inst, 298 Desheng Middle Rd, Hangzhou 310021, Peoples R China

3.Minist Agr & Rural Affairs, Minist Prov Joint Construct, Key Lab Agr Equipment Southeast Hilly & Mountainou, 298 Desheng Middle Rd, Hangzhou 310021, Peoples R China

关键词: relational extraction; aquatic diseases; Casrel; fine-tuned pretrained model; self-attention mechanisms; relative position coding; BiLSTM; GHM loss function

期刊名称:APPLIED SCIENCES-BASEL ( 影响因子:2.5; 五年影响因子:2.7 )

ISSN:

年卷期: 2024 年 14 卷 14 期

页码:

收录情况: SCI

摘要: Featured Application The model is primarily utilized for the task of entity relationship extraction during the construction process of an aquatic disease knowledge graph.Abstract Entity-relationship extraction plays a pivotal role in the construction of domain knowledge graphs. For the aquatic disease domain, however, this relationship extraction is a formidable task because of overlapping relationships, data specialization, limited feature fusion, and imbalanced data samples, which significantly weaken the extraction's performance. To tackle these challenges, this study leverages published books and aquatic disease websites as data sources to compile a text corpus, establish datasets, and then propose the Fd-CasBGRel model specifically tailored to the aquatic disease domain. The model uses the Casrel cascading binary tagging framework to address relationship overlap; utilizes task fine-tuning for better performance on aquatic disease data; trains on specialized aquatic disease corpora to improve adaptability; and integrates the BRC feature fusion module-which incorporates self-attention mechanisms, BiLSTM, relative position encoding, and conditional layer normalization-to leverage entity position and context for enhanced fusion. Further, it replaces the traditional cross-entropy loss function with the GHM loss function to mitigate category imbalance issues. The experimental results indicate that the F1 score of the Fd-CasBGRel on the aquatic disease dataset reached 84.71%, significantly outperforming several benchmark models. This model effectively addresses the challenges of ternary extraction's low performance caused by high data specialization, insufficient feature integration, and data imbalances. The model achieved the highest F1 score of 86.52% on the overlapping relationship category dataset, demonstrating its robust capability in extracting overlapping data. Furthermore, We also conducted comparative experiments on the publicly available dataset WebNLG, and the model in this paper obtained the best performance metrics compared to the rest of the comparative models, indicating that the model has good generalization ability.

  • 相关文献
作者其他论文 更多>>