AlphaCD: a machine learning model capable of highly accurate characterization for 21,335 cytidine deaminases

文献类型: 外文期刊

第一作者: Xu, Kui

作者: Xu, Kui;Hua, Guoying;Wu, Mingdi;Zhang, Haihang;Liu, Jingda;Feng, Hu;Zuo, Erwei

作者机构:

期刊名称:CELL RESEARCH ( 影响因子:25.9; 五年影响因子:36.6 )

ISSN: 1001-0602

年卷期: 2025 年

页码:

收录情况: SCI

摘要: The vast scope but limited-supporting evidence in sequence databases hinders identification of proteins with specific functionality. Here, we experimentally characterized catalytic efficiency, target site window, motif preference, and off-target activity of 1100 apolipoprotein B mRNA-editing enzyme, catalytic polypeptide (APOBEC)-like family cytidine deaminases (CDs) fused with nCas9 in HEK293T cells, thereby generating the largest dataset of experimentally validated functions for a single protein family to date. These data, together with amino acid sequence, three-dimensional structure, and eight additional features, were used to construct a machine learning (ML) model, AlphaCD, which showed high accuracy in predicting catalytic efficiency (0.92) and off-target activity (0.84), as well as target windows (0.73) and catalytic motifs (0.78). We applied the trained model to predict the above catalytic features of 21,335 CDs in Uniprot, and subsampling of 28 CDs further validated its prediction accuracy (0.84, 0.87, 0.75, 0.73, respectively). Alanine scanning-based mutagenesis was then employed to reduce off-targets in one example CD, which produced a remarkably high fidelity, high efficiency cytosine base editor, thus demonstrating AlphaCD application in high-accuracy, high-throughput protein functional characterization, and providing a strategy for accelerated characterization of other proteins.

分类号:

  • 相关文献
作者其他论文 更多>>