A high-throughput screening method for selecting feature SNPs to evaluate breed diversity and infer ancestry

文献类型: 外文期刊

第一作者: Zhang, Meilin

作者: Zhang, Meilin;Du, Heng;Zhang, Yu;Zhuo, Yue;Liu, Zhen;Xue, Yahui;Zhou, Lei;Li, Wanying;Liu, Jian-Feng;Zhang, Meilin;Du, Heng;Zhang, Yu;Zhuo, Yue;Liu, Zhen;Xue, Yahui;Zhou, Lei;Li, Wanying;Liu, Jian-Feng;Zhang, Meilin;Du, Heng;Zhang, Yu;Zhuo, Yue;Liu, Zhen;Xue, Yahui;Zhou, Lei;Li, Wanying;Liu, Jian-Feng;Zhang, Meilin;Du, Heng;Zhang, Yu;Zhuo, Yue;Liu, Zhen;Xue, Yahui;Zhou, Lei;Li, Wanying;Liu, Jian-Feng;Zhou, Sixuan

作者机构:

期刊名称:GENOME RESEARCH ( 影响因子:5.5; 五年影响因子:7.3 )

ISSN: 1088-9051

年卷期: 2025 年 35 卷 8 期

页码:

收录情况: SCI

摘要: As the scale of deep whole-genome sequencing (WGS) data has grown exponentially, hundreds of millions of single nucleotide polymorphisms (SNPs) have been identified in livestock. Utilizing these massive SNP data in population stratification analysis, ancestry prediction, and breed diversity assessments leads to overfitting issues in computational models and creates computational bottlenecks. Therefore, selecting genetic variants that express high amounts of information for use in population diversity studies and ancestry inference becomes critically important. Here, we develop a method, HITSNP, that combines feature selection and machine learning algorithms to select high-representative SNPs that can effectively estimate breed diversity and infer ancestry. HITSNP outperforms existing feature selection methods in estimating accuracy and computational stability. Furthermore, HITSNP offers a new algorithm to predict the number and composition of ancestral populations using a small number of SNPs, and avoiding calculating the number of clusters. Taken together, HITSNP facilitates the research of population structure, animal breeding, and animal resource protection.

分类号:

  • 相关文献
作者其他论文 更多>>