A Joint Knowledge Extraction Model for Tobacco Pest and Disease Prevention Based on BERT plus BA plus CASREL

文献类型: 外文期刊

第一作者: Liu, Kehan

作者: Liu, Kehan;Zhang, Feng;Wu, Qiulan;Sun, Ziruo;Liu, Kehan;Sun, Xiang;Wu, Huarui;Sun, Ziruo;Zhang, Feng;Wu, Qiulan;Sun, Xiang;Wu, Huarui

作者机构:

关键词: Tobacco pest and disease prevention; knowledge extraction; joint knowledge extraction model; Tobacco pest and disease prevention; knowledge extraction; joint knowledge extraction model

期刊名称:IEEE ACCESS ( 影响因子:3.6; 五年影响因子:3.9 )

ISSN: 2169-3536

年卷期: 2025 年 13 卷

页码:

收录情况: SCI

摘要: To address the issues of long-text overlapping triplets and data imbalance in text knowledge extraction for tobacco pest and disease prevention, this study first constructed a corpus of tobacco pest and disease prevention texts based on published books and relevant websites, and established a dataset. Subsequently, a joint knowledge extraction model based on BERT+BA+CASREL was proposed, and its precision was validated. In this model, BERT leverages the Transformer architecture to generate deep contextual semantic representations, effectively capturing complex semantic associations between words. The BA module consists of BiLSTM and Self-Attention, where BiLSTM performs bidirectional sequence modeling on the features extracted by BERT, enhancing the understanding of contextual and long-distance dependencies; Self-Attention assigns different weights to each input, strengthening the ability to capture long-term dependencies. The CASREL module, as the core component of the framework, employs a cascade binary tagging approach to achieve joint extraction of entities and relations, effectively improving the precision of complex relation recognition and end-to-end optimization. Additionally, the GHM loss function is adopted to replace the traditional cross-entropy loss function, alleviating the data imbalance issue. Experimental results demonstrate that the proposed model achieves a precision of 93.32%, a recall of 92.51%, and an F1-score of 92.91%, validating its effectiveness in addressing long-text overlapping triplet extraction and other related issues. The proposed model not only enhances the precision of text knowledge extraction in the field of tobacco pest and disease prevention but also provides new perspectives and methodologies for related research areas.

分类号:

  • 相关文献
作者其他论文 更多>>