2022
DOI: 10.3389/fmicb.2022.796465
|View full text |Cite
|
Sign up to set email alerts
|

Comparing Long-Read Assemblers to Explore the Potential of a Sustainable Low-Cost, Low-Infrastructure Approach to Sequence Antimicrobial Resistant Bacteria With Oxford Nanopore Sequencing

Abstract: Long-read sequencing (LRS) can resolve repetitive regions, a limitation of short read (SR) data. Reduced cost and instrument size has led to a steady increase in LRS across diagnostics and research. Here, we re-basecalled FAST5 data sequenced between 2018 and 2021 and analyzed the data in relation to gDNA across a large dataset (n = 200) spanning a wide GC content (25–67%). We examined whether re-basecalled data would improve the hybrid assembly, and, for a smaller cohort, compared long read (LR) assemblies in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(21 citation statements)
references
References 49 publications
3
18
0
Order By: Relevance
“…By contrast, Canu recovered 96 % of plasmids in replicate assemblies, followed by Flye-raw (91 %), Flye-meta (90 %), Flye-hq (88 %), Miniasm (85 %), and Raven (75 %) (; ). These findings support previous benchmarking studies which found that Canu and Flye perform best among long read assemblers, in terms of plasmid recovery [6, 11] but indicate that even with the Flye 2.9 update, continued improvements to both assemblers are needed to ensure that all plasmid sequences are recovered.…”
Section: Resultssupporting
confidence: 88%
See 1 more Smart Citation
“…By contrast, Canu recovered 96 % of plasmids in replicate assemblies, followed by Flye-raw (91 %), Flye-meta (90 %), Flye-hq (88 %), Miniasm (85 %), and Raven (75 %) (; ). These findings support previous benchmarking studies which found that Canu and Flye perform best among long read assemblers, in terms of plasmid recovery [6, 11] but indicate that even with the Flye 2.9 update, continued improvements to both assemblers are needed to ensure that all plasmid sequences are recovered.…”
Section: Resultssupporting
confidence: 88%
“…Assemblies generated from Illumina short-read sequences have a low error rate but are often highly fragmented, thus making it difficult to confidently differentiate plasmidic sequences from chromosomal sequences. By contrast, long-read sequencing technologies, Oxford Nanopore Technologies (ONT) and PacBio, often produce assemblies that are structurally complete but sometimes miss plasmids due to biases introduced during library preparation [4,5]; and/or issues with long-read genome assemblers [6][7][8]. To overcome these issues, a hybrid approach can be employed, which leverages the strengths of both long-and short-read sequencing technologies, to produce assemblies that are highly accurate and structurally complete [4,9].…”
Section: Introductionmentioning
confidence: 99%
“…For a 5 Mb genome, such as K. pneumoniae, we previously reported roughly 3000 errors (corresponding to at least 337 substitutions per genome assembly) [38]; therefore, ONTonly assemblies are prone to erroneous SNP calls, which can potentially hamper outbreak investigations [39]. The performance of ONT-only assemblies in identifying AMR and virulence traits against the gold-standard (hybrid assemblies) is unclear but is gaining attention [40][41][42], and is pertinent to inform the research community working with long-read data on what levels of accuracy to expect with current basecalling algorithms.…”
Section: Impact Statementmentioning
confidence: 99%
“…Many software or algorithm have been developed for bacterial genome assembly, such as Canu [4], FlyE [5], and Wtdbg2 [6]. They have relative advantages and disadvantages as well as varying performance and assembly outcomes, but in terms of overall performance, FlyE and Raven [7] stands out as the best bacterial genome assembler [8][9][10]. Nanopore sequencing data are characterized by the presence of indels, non-random systematic errors [11] and the occurrence of assembly errors spanning hundreds of bases [8], which may lead to inaccurate or incomplete assemblies.…”
Section: Introductionmentioning
confidence: 99%