您好,欢迎访问浙江省农业科学院 机构知识库!

SMAP is a pipeline for sample matching in proteogenomics

文献类型: 外文期刊

作者: Li, Ling 1 ; Niu, Mingming 2 ; Erickson, Alyssa 1 ; Luo, Jie 4 ; Rowbotham, Kincaid 1 ; Guo, Kai 5 ; Huang, He 1 ; Li, Yuxin 2 ; Jiang, Yi 6 ; Hur, Junguk 7 ; Liu, Chunyu 8 ; Peng, Junmin 2 ; Wang, Xusheng 1 ;

作者机构: 1.Univ North Dakota, Dept Biol, Grand Forks, ND 58202 USA

2.St Jude Childrens Res Hosp, Ctr Prote & Metabol, Dept Struct Biol, 332 N Lauderdale St, Memphis, TN 38105 USA

3.St Jude Childrens Res Hosp, Ctr Prote & Metabol, Dept Dev Neurobiol, 332 N Lauderdale St, Memphis, TN 38105 USA

4.Zhejiang Acad Agr Sci, State Key Lab Managing Biot & Chem Threats Qual &, Hangzhou 310021, Peoples R China

5.Univ Michigan, Dept Neurol, Ann Arbor, MI 48109 USA

6.Huazhong Univ Sci & Technol, Tongji Med Coll, Sch Publ Hlth, Dept Epidemiol & Biostat, Wuhan 430030, Peoples R China

7.Univ North Dakota, Sch Med & Hlth Sci, Dept Biomed Sci, Grand Forks, ND 58202 USA

8.SUNY Upstate Med Univ, Dept Psychiat, Syracuse, NY 13210 USA

期刊名称:NATURE COMMUNICATIONS ( 影响因子:17.694; 五年影响因子:17.763 )

ISSN:

年卷期: 2022 年 13 卷 1 期

页码:

收录情况: SCI

摘要: The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when >= 20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based version can be accessed at https://smap.shinyapps.io/smap/.

  • 相关文献
作者其他论文 更多>>