SMAP is a pipeline for sample matching in proteogenomics

文献类型: 外文期刊

第一作者: Li, Ling

作者: Li, Ling;Erickson, Alyssa;Rowbotham, Kincaid;Huang, He;Wang, Xusheng;Niu, Mingming;Li, Yuxin;Peng, Junmin;Niu, Mingming;Li, Yuxin;Peng, Junmin;Luo, Jie;Guo, Kai;Jiang, Yi;Hur, Junguk;Liu, Chunyu

作者机构:

期刊名称:NATURE COMMUNICATIONS ( 影响因子:17.694; 五年影响因子:17.763 )

ISSN:

年卷期: 2022 年 13 卷 1 期

页码:

收录情况: SCI

摘要: The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when >= 20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based version can be accessed at https://smap.shinyapps.io/smap/.

分类号:

  • 相关文献
作者其他论文 更多>>