您好,欢迎访问北京市农林科学院 机构知识库!

Recognition of the Agricultural Named Entities With Multifeature Fusion Based on ALBERT

文献类型: 外文期刊

作者: Zhao, Pengfei 1 ; Wang, Wei 1 ; Liu, Hai 1 ; Han, Mo 1 ;

作者机构: 1.Beijing Acad Agr & Forestry Sci, Natl Engn Res Ctr Informat Technol Agr, Beijing 100097, Peoples R China

关键词: Feature extraction; Semantics; Task analysis; Hidden Markov models; Agriculture; Diseases; Convolutional neural networks; Agriculture; named entity recognition; self-attention; long short-term memory; conditional random field

期刊名称:IEEE ACCESS ( 影响因子:3.476; 五年影响因子:3.758 )

ISSN: 2169-3536

年卷期: 2022 年 10 卷

页码:

收录情况: SCI

摘要: High quality agricultural named entity recognition (NER) model can provide effective support for agricultural information extraction, semantic retrieval and other tasks. However, the existing models ignore the potential characteristics of Chinese characters, resulting in the lack of internal semantics. Moreover, the agricultural text sequence is long, which leads to the lack of long-distance dependence of model capture. In order to solve the above problems, a self-attention mechanism RSA-CANER agricultural named entity recognition model is proposed which incorporating the potential characteristics of Chinese characters. First, the model takes character features and potential features of Chinese characters as input to enrich semantic information. Among them, character features are obtained based on ALBERT pre training tool, radical features are extracted based on convolutional neural network (CNN), and stroke features are extracted based on bidirectional long short-term memory model (BiLSTM). Then, based on the BiLSTM, the sequence characteristic matrix is obtained, and the self-attention mechanism is used to further enhance the ability of the model to capture long-distance dependence. Finally, the global optimal sequence is generated based on conditional random field (CRF) model. It obtains an F-score of 95.56%. The experimental results show that the model learns semantic information at multiple fine-grained levels of radicals and strokes, enriches the vector expression of target words, and its recognition precision is better than other models, improving the generalization ability of the model.

  • 相关文献
作者其他论文 更多>>