Comparing Long-Read Assemblers to Explore the Potential of a Sustainable Low-Cost, Low-Infrastructure Approach to Sequence Antimicrobial Resistant Bacteria With Oxford Nanopore Sequencing

Boostrom, Ian; Portal, Edward; Spiller, Owen B.; Walsh, Timothy R.; Sands, Kirsty

doi:10.3389/fmicb.2022.796465

Cited by 17 publications

(21 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By contrast, Canu recovered 96 % of plasmids in replicate assemblies, followed by Flye-raw (91 %), Flye-meta (90 %), Flye-hq (88 %), Miniasm (85 %), and Raven (75 %) (; ). These findings support previous benchmarking studies which found that Canu and Flye perform best among long read assemblers, in terms of plasmid recovery [6, 11] but indicate that even with the Flye 2.9 update, continued improvements to both assemblers are needed to ensure that all plasmid sequences are recovered.…”

Section: Resultssupporting

confidence: 88%

“…Assemblies generated from Illumina short-read sequences have a low error rate but are often highly fragmented, thus making it difficult to confidently differentiate plasmidic sequences from chromosomal sequences. By contrast, long-read sequencing technologies, Oxford Nanopore Technologies (ONT) and PacBio, often produce assemblies that are structurally complete but sometimes miss plasmids due to biases introduced during library preparation [4,5]; and/or issues with long-read genome assemblers [6][7][8]. To overcome these issues, a hybrid approach can be employed, which leverages the strengths of both long-and short-read sequencing technologies, to produce assemblies that are highly accurate and structurally complete [4,9].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Long read genome assemblers struggle with small plasmids

2023

View full text Add to dashboard Cite

Whole-genome sequencing has become a preferred method for studying bacterial plasmids, as it is generally assumed to capture the entire genome. However, long-read genome assemblers have been shown to sometimes miss plasmid sequences – an issue that has been associated with plasmid size. The purpose of this study was to investigate the relationship between plasmid size and plasmid recovery by the long-read-only assemblers, Flye, Raven, Miniasm, and Canu. This was accomplished by determining the number of times each assembler successfully recovered 33 plasmids, ranging from 1919 to 194 062 bp in size and belonging to 14 bacterial isolates from six bacterial genera, using Oxford Nanopore long reads. These results were additionally compared to plasmid recovery rates by the short-read-first assembler, Unicycler, using both Oxford Nanopore long reads and Illumina short reads. Results from this study indicate that Canu, Flye, Miniasm, and Raven are prone to missing plasmid sequences, whereas Unicycler was successful at recovering 100 % of plasmid sequences. Excluding Canu, most plasmid loss by long-read-only assemblers was due to failure to recover plasmids smaller than 10 kb. As such, it is recommended that Unicycler be used to increase the likelihood of plasmid recovery during bacterial genome assembly.

show abstract

Section: Resultssupporting

confidence: 88%

Section: Introductionmentioning

confidence: 99%

Long read genome assemblers struggle with small plasmids

2023

View full text Add to dashboard Cite

show abstract

“…For a 5 Mb genome, such as K. pneumoniae, we previously reported roughly 3000 errors (corresponding to at least 337 substitutions per genome assembly) [38]; therefore, ONTonly assemblies are prone to erroneous SNP calls, which can potentially hamper outbreak investigations [39]. The performance of ONT-only assemblies in identifying AMR and virulence traits against the gold-standard (hybrid assemblies) is unclear but is gaining attention [40][41][42], and is pertinent to inform the research community working with long-read data on what levels of accuracy to expect with current basecalling algorithms.…”

Section: Impact Statementmentioning

confidence: 99%

Nanopore-only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella pneumoniae

et al. 2023

View full text Add to dashboard Cite

Oxford Nanopore Technologies (ONT) sequencing has rich potential for genomic epidemiology and public health investigations of bacterial pathogens, particularly in low-resource settings and at the point of care, due to its portability and affordability. However, low base-call accuracy has limited the reliability of ONT data for critical tasks such as antimicrobial resistance (AMR) and virulence gene detection and typing, serotype prediction, and cluster identification. Thus, Illumina sequencing remains the standard for genomic surveillance despite higher capital and running costs. We tested the accuracy of ONT-only assemblies for common applied bacterial genomics tasks (genotyping and cluster detection, implemented via Kleborate, Kaptive and Pathogenwatch), using data from 54 unique Klebsiella pneumoniae isolates. ONT reads generated via MinION with R9.4.1 flowcells were basecalled using three alternative models [Fast, High-accuracy (HAC) and Super-accuracy (SUP), available within ONT’s Guppy software], assembled with Flye and polished using Medaka. Accuracy of typing using ONT-only assemblies was compared with that of Illumina-only and hybrid ONT+Illumina assemblies, constructed from the same isolates as reference standards. The most resource-intensive ONT-assembly approach (SUP basecalling, with or without Medaka polishing) performed best, yielding reliable capsule (K) type calls for all strains (100 % exact or best matching locus), reliable multi-locus sequence type (MLST) assignment (98.3 % exact match or single-locus variants), and good detection of acquired AMR genes and mutations (88–100 % correct identification across the various drug classes). Distance-based trees generated from SUP+Medaka assemblies accurately reflected overall genetic relationships between isolates. The definition of outbreak clusters from ONT-only assemblies was problematic due to inflation of SNP counts by high base-call errors. However, ONT data could be reliably used to ‘rule out’ isolates of distinct lineages from suspected transmission clusters. HAC basecalling + Medaka polishing performed similarly to SUP basecalling without polishing. Therefore, we recommend investing compute resources into basecalling (SUP model), wherever compute resources and time allow, and note that polishing is also worthwhile for improved performance. Overall, our results show that MLST, K type and AMR determinants can be reliably identified with ONT-only R9.4.1 flowcell data. However, cluster detection remains challenging with this technology.

show abstract

“…Many software or algorithm have been developed for bacterial genome assembly, such as Canu [4], FlyE [5], and Wtdbg2 [6]. They have relative advantages and disadvantages as well as varying performance and assembly outcomes, but in terms of overall performance, FlyE and Raven [7] stands out as the best bacterial genome assembler [8][9][10]. Nanopore sequencing data are characterized by the presence of indels, non-random systematic errors [11] and the occurrence of assembly errors spanning hundreds of bases [8], which may lead to inaccurate or incomplete assemblies.…”

Section: Introductionmentioning

confidence: 99%

MAECI: A pipeline for generating consensus sequence with nanopore sequencing long-read assembly and error correction

Lang

2022

PLoS ONE

View full text Add to dashboard Cite

Nanopore sequencing produces long reads and offers unique advantages over next-generation sequencing, especially for the assembly of draft bacterial genomes with improved completeness. However, assembly errors can occur due to data characteristics and assembly algorithms. To address these issues, we developed MAECI, a pipeline for generating consensus sequences from multiple assemblies of the same nanopore sequencing data and error correction. Systematic evaluation showed that MAECI is an efficient and effective pipeline to improve the accuracy and completeness of bacterial genome assemblies. The available codes and implementation are at https://github.com/langjidong/MAECI.

show abstract

Comparing Long-Read Assemblers to Explore the Potential of a Sustainable Low-Cost, Low-Infrastructure Approach to Sequence Antimicrobial Resistant Bacteria With Oxford Nanopore Sequencing

Cited by 17 publications

References 49 publications

Long read genome assemblers struggle with small plasmids

Long read genome assemblers struggle with small plasmids

Nanopore-only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella pneumoniae

MAECI: A pipeline for generating consensus sequence with nanopore sequencing long-read assembly and error correction

Contact Info

Product

Resources

About