A Feature Engineering Method for Whole-Genome DNA Sequence with Nucleotide Resolution
文献类型: 外文期刊
第一作者: Wang, Ting
作者: Wang, Ting;Cui, Yunpeng;Sun, Tan;Li, Huan;Hou, Ying;Wang, Mo;Chen, Li;Wu, Jinming;Wang, Ting;Cui, Yunpeng;Sun, Tan;Li, Huan;Hou, Ying;Wang, Mo;Chen, Li;Wu, Jinming;Wang, Chao
作者机构:
关键词: feature construction; genetic selection; Omics analysis; large language model; agronomic trait prediction
期刊名称:INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES ( 影响因子:4.9; 五年影响因子:5.7 )
ISSN: 1661-6596
年卷期: 2025 年 26 卷 5 期
页码:
收录情况: SCI
摘要: Feature engineering for whole-genome DNA sequences plays a critical role in predicting plant phenotypic traits. However, due to limitations in the models' analytical capabilities and computational resources, the existing methods are predominantly confined to SNP-based approaches, which typically extract genetic variation sites for dimensionality reduction before feature extraction. These methods not only suffer from incomplete locus coverage and insufficient genetic information but also overlook the relationships between nucleotides, thereby restricting the accuracy of phenotypic trait prediction. Inspired by the parallels between gene sequences and natural language, the emergence of large language models (LLMs) offers novel approaches for addressing the challenge of constructing genome-wide feature representations with nucleotide granularity. This study proposes FE-WDNA, a whole-genome DNA sequence feature engineering method, using HyenaDNA to fine-tune it on whole-genome data from 1000 soybean samples. We thus provide deep insights into the contextual and long-range dependencies among nucleotide sites to derive comprehensive genome-wide feature vectors. We further evaluated the application of FE-WDNA in agronomic trait prediction, examining factors such as the context window length of the DNA input, feature vector dimensions, and trait prediction methods, achieving significant improvements compared to the existing SNP-based approaches. FE-WDNA provides a mode of high-quality DNA sequence feature engineering at nucleotide resolution, which can be transformed to other plants and directly applied to various computational breeding tasks.
分类号:
- 相关文献
作者其他论文 更多>>
-
Bioinspired Janus starch film with dual functionality via citral nanoemulsion-mediated interfacial self-assembly for fresh-cut fruits and vegetables preservation
作者:Xie, Ying;Ding, Ke;Zhang, Shikai;Xu, Saiqing;Li, Huan;Lin, Shuhua;Shan, Yang;Ding, Shenghua;Xie, Ying;Ding, Ke;Xu, Saiqing;Shan, Yang;Ding, Shenghua;Sun, Yuying;Li, Yawen;Wang, Rongrong
关键词:Starch film; Janus structure; Rapid self-assembly; Fresh-cut fruits and vegetables preservation
-
Characteristics and phytotoxicity of hydrochar-derived dissolved organic matter: Effects of feedstock type and hydrothermal temperature
作者:Lang, Qianqian;Guo, Xuan;Wang, Chao;Li, Lingyao;Li, Yufei;Xu, Junxiang;Zhao, Xiang;Li, Jijin;Liu, Bensheng;Sun, Qinping;Zou, Guoyuan
关键词:Hydrochar; Hydrothermal temperature; Dissolved organic matter; Excitation emission matrix; Parallel factor analysis; Phytotoxicity
-
Pristine/magnesium-loaded biochar and ZVI affect rice grain arsenic speciation and cadmium accumulation through different pathways in an alkaline paddy soil
作者:Zhang, Chen;Shi, Dong;Wang, Chao;Hu, Yanxia;Li, Xiaona;Hou, Yanhui;Zheng, Ruilun;Zhang, Chen;Li, Huafen;Sun, Guoxin
关键词:Cadmium (Cd); Arsenic (As) speciation; Co-contamination; Magnesium-loaded biochar; Zero-valent iron (ZVI); Rice
-
A Comprehensive Review of Diatom-Bacterial Interactions Inferred From Bibliometric Analysis
作者:Hu, Caiqin;Hu, Tiehuan;Gao, Yuan;Liu, Qianfu;Wang, Chao;Hu, Caiqin;Gao, Yuan;Liu, Qianfu;Wang, Chao;Hu, Caiqin;Gao, Yuan;Liu, Qianfu;Wang, Chao;Shi, Zhen;Hu, Tiehuan;Gao, Yuan;Liu, Qianfu;Wang, Chao;Gao, Yuan;Liu, Qianfu;Wang, Chao
关键词:bacteria; bibliometric analysis; diatom; microbial interactions; research foci
-
Cross-Shaped Heat Tensor Network for Morphometric Analysis Using Zebrafish Larvae Feature Keypoints
作者:Chai, Xin;Li, Zhaoxin;Zhang, Yanqi;Sun, Qixin;Zhang, Ning;Chai, Xiujuan;Sun, Tan;Qiu, Jing
关键词:zebrafish; digital phenotype; non-destructive examination; keypoints localization; deep feature learning
-
Absorption and transport mechanism of colloidal nanoparticles (CNPs) in lamb soup based on Caco-2 cell
作者:Fu, Jianing;Liu, Ling;Fu, Jianing;Li, Shaobo;Xu, Meizhen;Chen, Li;Zhang, Dequan
关键词:Colloidal nanoparticles; Caco-2 cell; Lamb soup; Absorption mechanism; Transport
-
Transcriptomic analysis reveals the mechanism of ozone fumigation combined with polyethylene nanopackaging for delaying the browning and softening of mushrooms ( Agaricus bisporus)
作者:Wang, Biao;Yun, Jianmin;Guo, Weihong;Shen, Jiawei;Zhao, Fengyun;Qu, Yuling;Wang, Ting;Yao, Liang
关键词:Polyethylene nanopackaging; Ozone fumigation; Browning; Softening; Agaricus bisporus; Regulation mechanism