LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly

文献类型: 外文期刊

第一作者: Xu, Gui-Cai

作者: Xu, Gui-Cai;Zhu, Rui;Zhang, Yan;Li, Shang-Qi;Wang, Hong-Wei;Li, Jiong-Tang;Xu, Gui-Cai;Zhu, Rui;Zhang, Yan;Li, Shang-Qi;Wang, Hong-Wei;Li, Jiong-Tang;Xu, Gui-Cai;Xu, Tian-Jun;Zhu, Rui

作者机构:

关键词: gap-closure; genome assembly; third-generation sequencing; next-generation sequencing; repetitive elements

期刊名称:GIGASCIENCE ( 影响因子:6.524; 五年影响因子:8.702 )

ISSN: 2047-217X

年卷期: 2019 年 8 卷 1 期

页码:

收录情况: SCI

摘要: Background: Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. Findings: We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. Conclusions: LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.

分类号:

  • 相关文献
作者其他论文 更多>>