RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions

文献类型: 外文期刊

第一作者: Yang, Jinbao

作者: Yang, Jinbao;Pan, Weihua;Yang, Jinbao;Zhao, Xianjia;Jiang, Heling;Yang, Yingxue;Hou, Yuze;Pan, Weihua;Zhao, Xianjia

作者机构:

期刊名称:HORTICULTURE RESEARCH ( 影响因子:8.7; 五年影响因子:9.0 )

ISSN: 2662-6810

年卷期: 2023 年 10 卷 1 期

页码:

收录情况: SCI

摘要: Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%-90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively.

分类号:

  • 相关文献
作者其他论文 更多>>