Chat-rgie: precision extraction of rice germplasm data using large language models and prompt engineering

文献类型: 外文期刊

第一作者: Wei, Yijin

作者: Wei, Yijin;Fan, Jingchao;Wei, Yijin;Fan, Jingchao

作者机构:

关键词: Data extraction; Large language model (LLM); Rice germplasm; Agriculture

期刊名称:JOURNAL OF BIG DATA ( 影响因子:6.4; 五年影响因子:13.4 )

ISSN:

年卷期: 2025 年 12 卷 1 期

页码:

收录情况: SCI

摘要: Varietal improvement is a key aspect of breeding, and as a result of this work, crop varietal data becomes more complicated, requiring more resources to extract. As a result, we developed Chat-RGIE, a rice germplasm data extraction strategy based on conversational large language models (LLM) and cue word engineering, to achieve rice germplasm data extraction in a ZERO-shot manner. The technique employs multi-response voting to limit the chance of phantom appearances, as well as an additional calibration component to choose the best data extraction findings. We performed performance evaluation and real-life data extraction evaluation on Chat-RGIE, and the scheme obtained 0.9102 precision, 0.9941 recall, and 0.9554 accuracy in performance evaluation, and 0.6351 precision, 1.0 recall, and 0.8225 accuracy in real-life data extraction evaluation, which completely proved the effectiveness of the scheme. Furthermore, the well-designed data extraction procedure mitigates the likelihood of potential bias from a single large model leading to hallucinations to some extent, with the incidence of hallucinations in the two evaluations being 0.0015 and 0.005, respectively, with a very minor influence. Furthermore, we employed Restraint Rate, a statistic used to quantify the degree of limits placed by the prompt on LLM replies, with values of 0.9265 and 0.911 in the two evaluations, resulting in normative responses. Furthermore, when we examined the data extraction results, we discovered that when confronted with an unanswerable answer, the LLM is affected by the stress provided by the prompt, and the higher the stress, the more likely it is to engage in constraint-violating behavior, which is similar to what humans do when stressed. We therefore believe that some of the countermeasures in the human behavior in question also have the potential to help improve LLM performance.

分类号:

  • 相关文献
作者其他论文 更多>>