A Feature Engineering Method for Whole-Genome DNA Sequence with Nucleotide Resolution
文献类型: 外文期刊
第一作者: Wang, Ting
作者: Wang, Ting;Cui, Yunpeng;Sun, Tan;Li, Huan;Hou, Ying;Wang, Mo;Chen, Li;Wu, Jinming;Wang, Ting;Cui, Yunpeng;Sun, Tan;Li, Huan;Hou, Ying;Wang, Mo;Chen, Li;Wu, Jinming;Wang, Chao
作者机构:
关键词: feature construction; genetic selection; Omics analysis; large language model; agronomic trait prediction
期刊名称:INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES ( 影响因子:4.9; 五年影响因子:5.7 )
ISSN: 1661-6596
年卷期: 2025 年 26 卷 5 期
页码:
收录情况: SCI
摘要: Feature engineering for whole-genome DNA sequences plays a critical role in predicting plant phenotypic traits. However, due to limitations in the models' analytical capabilities and computational resources, the existing methods are predominantly confined to SNP-based approaches, which typically extract genetic variation sites for dimensionality reduction before feature extraction. These methods not only suffer from incomplete locus coverage and insufficient genetic information but also overlook the relationships between nucleotides, thereby restricting the accuracy of phenotypic trait prediction. Inspired by the parallels between gene sequences and natural language, the emergence of large language models (LLMs) offers novel approaches for addressing the challenge of constructing genome-wide feature representations with nucleotide granularity. This study proposes FE-WDNA, a whole-genome DNA sequence feature engineering method, using HyenaDNA to fine-tune it on whole-genome data from 1000 soybean samples. We thus provide deep insights into the contextual and long-range dependencies among nucleotide sites to derive comprehensive genome-wide feature vectors. We further evaluated the application of FE-WDNA in agronomic trait prediction, examining factors such as the context window length of the DNA input, feature vector dimensions, and trait prediction methods, achieving significant improvements compared to the existing SNP-based approaches. FE-WDNA provides a mode of high-quality DNA sequence feature engineering at nucleotide resolution, which can be transformed to other plants and directly applied to various computational breeding tasks.
分类号:
- 相关文献
作者其他论文 更多>>
-
Bioinspired Janus starch film with dual functionality via citral nanoemulsion-mediated interfacial self-assembly for fresh-cut fruits and vegetables preservation
作者:Xie, Ying;Ding, Ke;Zhang, Shikai;Xu, Saiqing;Li, Huan;Lin, Shuhua;Shan, Yang;Ding, Shenghua;Xie, Ying;Ding, Ke;Xu, Saiqing;Shan, Yang;Ding, Shenghua;Sun, Yuying;Li, Yawen;Wang, Rongrong
关键词:Starch film; Janus structure; Rapid self-assembly; Fresh-cut fruits and vegetables preservation
-
Pristine/magnesium-loaded biochar and ZVI affect rice grain arsenic speciation and cadmium accumulation through different pathways in an alkaline paddy soil
作者:Zhang, Chen;Shi, Dong;Wang, Chao;Hu, Yanxia;Li, Xiaona;Hou, Yanhui;Zheng, Ruilun;Zhang, Chen;Li, Huafen;Sun, Guoxin
关键词:Cadmium (Cd); Arsenic (As) speciation; Co-contamination; Magnesium-loaded biochar; Zero-valent iron (ZVI); Rice
-
Microbes: Drivers of Chenpi manufacturing, biotransformation, and physiological effects
作者:Fu, Yanjiao;Wang, Chao;Liao, Yanfang;Peng, Mingfang;Fu, Fuhua;Li, Gaoyang;Su, Donglin;Shan, Yang;Fu, Yanjiao;Wang, Chao;Liao, Yanfang;Peng, Mingfang;Fu, Fuhua;Li, Gaoyang;Su, Donglin;Guo, Jiajing;Shan, Yang;Fu, Yanjiao;Wang, Chao;Liao, Yanfang;Peng, Mingfang;Fu, Fuhua;Li, Gaoyang;Su, Donglin;Guo, Jiajing;Shan, Yang;Fu, Yanjiao;Wang, Chao;Liao, Yanfang;Peng, Mingfang;Fu, Fuhua;Li, Gaoyang;Su, Donglin;Guo, Jiajing;Shan, Yang;Gao, Zhipeng
关键词:Chenpi; Manufacturing; Biotransformation; Physiological effects; Microbiome; Applications
-
Dimerization among multiple NAC proteins mediates secondary cell wall cellulose biosynthesis in cotton fibers
作者:Chen, Feng;Qiao, Mengfei;Chen, Li;Liu, Min;Luo, Jingwen;Gao, Yanan;Li, Mengyun;Cai, Jinglong;Huang, Gengqing;Xu, Wenliang;Persson, Staffan;Persson, Staffan;Xu, Wenliang
关键词:cotton fiber; secondary cell wall; cellulose; transcriptional regulation; NAC domain proteins; dimerization; protein complex
-
Efficient Triple Attention and AttentionMix: A Novel Network for Fine-Grained Crop Disease Classification
作者:Zhang, Yanqi;Zhang, Ning;Chai, Xiujuan;Zhu, Jingbo;Dong, Wei;Sun, Tan
关键词:crop pests and diseases; CNNs; channel attention; spatial attention; data augmentation
-
Resource-enhancing global changes shift soil multifunctionality towards faster cycling in arid grasslands
作者:Song, Zhaobin;Zuo, Xiaoan;Wang, Shaokun;Li, Xiangyun;Hu, Ya;Qiao, Jingjuan;Song, Zhaobin;Zuo, Xiaoan;Wang, Shaokun;Li, Xiangyun;Hu, Ya;Qiao, Jingjuan;Song, Zhaobin;Li, Xiangyun;Qiao, Jingjuan;Wang, Chao;Fry, Ellen L.;Sardans, Jordi;Penuelas, Josep;Sardans, Jordi;Penuelas, Josep;Hautier, Yann;Zuo, Xiaoan
关键词:Soil function; Microbial diversity; Grassland; Plant diversity; Global change ecology; Nutrient cycling
-
Epidemiological investigation of goose circovirus based on a newly developed indirect ELISA method
作者:Chen, Jialong;Xue, Wenchang;Yao, Zhanxin;Wang, Chao;Wang, He;Zhang, Jipei;Chen, Jidang;Zhu, Wanjun;Wang, He;Tang, Yi;Liu, Rongchang;Chen, Jidang
关键词:goose circovirus; capsid protein; serological detection methods; prokaryotic expression; indirect ELISA