Comparison of De Novo Assembly Strategies for Bacterial Genomes

Zhang, Pengfei; Jiang, Dike; Wang, Yin; Yao, Xueping; Luo, Yan; Yang, Zexiao

doi:10.3390/ijms22147668

Cited by 25 publications

(18 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, in some cases a relevant reference genome is not known. In this case, a de novo genome assembly is required, involving the construction of a complete nucleotide sequence without a reference [ 1 ]. This task is extremely difficult if using next-generation sequencing (NGS), because the typical lengths of reads are hundreds of nucleotides long [ 2 ].…”

Section: Introductionmentioning

confidence: 99%

Nanopore Sequencing for De Novo Bacterial Genome Assembly and Search for Single-Nucleotide Polymorphism

Khrenova

Panova

Rodin

et al. 2022

IJMS

View full text Add to dashboard Cite

Nanopore sequencing (ONT) is a new and rapidly developing method for determining nucleotide sequences in DNA and RNA. It serves the ability to obtain long reads of thousands of nucleotides without assembly and amplification during sequencing compared to next-generation sequencing. Nanopore sequencing can help for determination of genetic changes leading to antibiotics resistance. This study presents the application of ONT technology in the assembly of an E. coli genome characterized by a deletion of the tolC gene and known single-nucleotide variations leading to antibiotic resistance, in the absence of a reference genome. We performed benchmark studies to determine minimum coverage depth to obtain a complete genome, depending on the quality of the ONT data. A comparison of existing programs was carried out. It was shown that the Flye program demonstrates plausible assembly results relative to others (Shasta, Canu, and Necat). The required coverage depth for successful assembly strongly depends on the size of reads. When using high-quality samples with an average read length of 8 Kbp or more, the coverage depth of 30× is sufficient to assemble the complete genome de novo and reliably determine single-nucleotide variations in it. For samples with shorter reads with mean lengths of 2 Kbp, a higher coverage depth of 50× is required. Avoiding of mechanical mixing is obligatory for samples preparation. Nanopore sequencing can be used alone to determine antibiotics-resistant genetic features of bacterial strains.

show abstract

Section: Introductionmentioning

confidence: 99%

Nanopore Sequencing for De Novo Bacterial Genome Assembly and Search for Single-Nucleotide Polymorphism

Khrenova

Panova

Rodin

et al. 2022

IJMS

View full text Add to dashboard Cite

show abstract

“…Short-read-first hybrid assemblies were generated using Unicycler v0.4.9b, which starts by building a short-read assembly graph with SPAdes v3.14.0, then uses the corresponding long reads (HAC basecalled, in this case) to scaffold the genome, and finally runs Pilon v1.23 [58] in an attempt to fill gaps, correct bases and fix misassemblies using the short reads [26, 54] (HAC basecalled reads were used for the short-read-first assemblies as these were generated before the SUP basecalling model became available [50]). To generate long-read-first assemblies [59], we used Flye v2.8-1 to produce ONT-only assemblies for the set of reads basecalled with the HAC and SUP-accuracy basecalling models, followed by long-read polishing with Medaka [30] to repair any residual errors using ONT long reads, then finally short-read polishing using Illumina reads and Pilon v1.24 (following the recommendations noted at [60]). Thus, altogether, we produced ten assemblies per sample, encompassing reads derived from the three separate basecalling models ( Figure 1 ).…”

Section: Methodsmentioning

confidence: 99%

Nanopore-only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella pneumoniae

Foster-Nyarko

Cottingham

Wick

et al. 2022

Preprint

View full text Add to dashboard Cite

Background Oxford Nanopore Technologies (ONT) sequencing has rich potential for genomic epidemiology and public health investigations of bacterial pathogens, particularly in low–resource settings and at the point of care, due to its portability and affordability. However, low base–call accuracy has limited the reliability of ONT data for critical tasks such as antimicrobial resistance (AMR) and virulence gene detection and typing, serotype prediction and cluster identification. Thus, Illumina sequencing remains the standard for genomic surveillance despite higher capital and running costs. Methods We tested the accuracy of ONT–only assemblies for common applied bacterial genomics tasks (genotyping and cluster detection, implemented via Kleborate, Kaptive and Pathogenwatch), using data from 54 unique Klebsiella pneumoniae isolates. ONT reads generated via MinION with R9·4 flowcells were basecalled using three alternative models (Fast, High–accuracy (HAC) and Super–accuracy (SUP), available within ONT's Guppy software), assembled with Flye and polished using Medaka. Accuracy of typing using ONT–only assemblies was compared with that of Illumina–only and hybrid ONT+Illumina assemblies, constructed from the same isolates as reference standards. Results The most resource–intensive ONT–assembly approach (SUP basecalling, with or without Medaka polishing) performed best, yielding reliable capsule (K) type calls for all strains (100% exact or best matching locus), reliable multi–locus sequence type (MLST) assignment (98·3% exact match or single–locus variants), and good detection of acquired AMR genes and mutations (88% – 100% correct identification across the various drug classes). Distance–based trees generated from SUP+Medaka assemblies accurately reflected overall genetic relationships between isolates; however, the definition of outbreak clusters from ONT–only assemblies was problematic. HAC basecalling + Medaka polishing performed similarly to SUP basecalling without polishing, and polishing introduced errors into HAC– or Fast–basecalled assemblies. Therefore, we recommend investing compute resources into basecalling (SUP model) over polishing, where compute resources and/or time are limiting. Conclusions Overall, our results show that MLST, K type and AMR determinants can be reliably identified with ONT–only data. However, cluster detection remains challenging with this technology.

show abstract

“…Thanks to several machine-learning algorithms, these errors have been significantly reduced by read-based (e.g., Medaka) or reference-based (e.g., Homopolish) polishing methods [5]. These algorithmic advances have produced high-quality ONT genomes sufficient for downstream analysis (e.g., >Q50) [3, 6].…”

Section: Introductionmentioning

confidence: 99%

Correcting Modification-Mediated Errors in Nanopore Sequencing by Nucleotide Demodification and in silico Correction

Chiou

Chen

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

The accuracy of Oxford Nanopore Technology (ONT) sequencing has significantly improved thanks to new flowcells, sequencing kits, and basecalling algorithms. However, novel modifications untrained in the basecalling models can seriously reduce the quality. This paper reports a set of ONT-sequenced genomes with unexpected low quality (∼Q30) due to extensive new modifications. Demodification by whole-genome amplification (WGA) significantly improved the quality of all genomes (∼Q50-60) while losing the epigenome. We developed a computational method, Modpolish, for correcting modification-mediated errors without WGA. Modpolish produced high-quality genomes and uncovered the underlying modification motifs without loss of epigenome. Our results suggested that novel modifications are prone to ONT errors, which are correctable by WGA or Modpolish without additional short-read sequencing.

show abstract

Comparison of De Novo Assembly Strategies for Bacterial Genomes

Cited by 25 publications

References 32 publications

Nanopore Sequencing for De Novo Bacterial Genome Assembly and Search for Single-Nucleotide Polymorphism

Nanopore Sequencing for De Novo Bacterial Genome Assembly and Search for Single-Nucleotide Polymorphism

Nanopore-only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella pneumoniae

Correcting Modification-Mediated Errors in Nanopore Sequencing by Nucleotide Demodification and in silico Correction

Contact Info

Product

Resources

About