Prioritized SNP Selection from Whole-Genome Sequencing Improves Genomic Prediction Accuracy in Sturgeons Using Linear and Machine Learning Models

文献类型: 外文期刊

第一作者: Song, Hailiang

作者: Song, Hailiang;Wang, Wei;Dong, Tian;Yan, Xiaoyu;Geng, Chenfan;Bai, Song;Hu, Hongxia;Song, Hailiang;Wang, Wei;Dong, Tian;Yan, Xiaoyu;Geng, Chenfan;Bai, Song;Hu, Hongxia;Song, Hailiang;Wang, Wei;Dong, Tian;Yan, Xiaoyu;Geng, Chenfan;Bai, Song;Hu, Hongxia

作者机构:

关键词: sturgeon; genomic prediction; GWAS; machine learning models; SNP density; aquaculture industry; breeding; caviar traits

期刊名称:INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES ( 影响因子:4.9; 五年影响因子:5.7 )

ISSN: 1661-6596

年卷期: 2025 年 26 卷 14 期

页码:

收录情况: SCI

摘要: Genomic prediction has emerged as a powerful tool in aquaculture breeding, but its effectiveness depends on the careful selection of informative single nucleotide polymorphisms (SNPs) and the application of appropriate prediction models. This study aimed to enhance genomic prediction accuracy in Russian sturgeon (Acipenser gueldenstaedtii) by optimizing SNP selection strategies and exploring the performance of linear and machine learning models. Three economically important traits-caviar yield, caviar color, and body weight-were selected due to their direct relevance to breeding goals and market value. Whole-genome sequencing (WGS) data were obtained from 971 individuals with an average sequencing depth of 13.52x. To reduce marker density and eliminate redundancy, three SNP selection strategies were applied: (1) genome-wide association study (GWAS)-based prioritization to select trait-associated SNPs; (2) linkage disequilibrium (LD) pruning to retain independent markers; and (3) random sampling as a control. Genomic prediction was conducted using both linear (e.g., GBLUP) and machine learning models (e.g., random forest) across varying SNP densities (1 K to 50 K). Results showed that GWAS-based SNP selection consistently outperformed other strategies, especially at moderate densities (>= 10 K), improving prediction accuracy by up to 3.4% compared to the full WGS dataset. LD-based selection at higher densities (30 K and 50 K) achieved comparable performance to full WGS. Notably, machine learning models, particularly random forest, exceeded the performance of linear models, yielding an additional 2.0% increase in accuracy when combined with GWAS-selected SNPs. In conclusion, integrating WGS data with GWAS-informed SNP selection and advanced machine learning models offers a promising framework for improving genomic prediction in sturgeon and holds promise for broader applications in aquaculture breeding programs.

分类号:

  • 相关文献
作者其他论文 更多>>