2021
DOI: 10.3390/ijms22147668
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of De Novo Assembly Strategies for Bacterial Genomes

Abstract: (1) Background: Short-read sequencing allows for the rapid and accurate analysis of the whole bacterial genome but does not usually enable complete genome assembly. Long-read sequencing greatly assists with the resolution of complex bacterial genomes, particularly when combined with short-read Illumina data. However, it is not clear how different assembly strategies affect genomic accuracy, completeness, and protein prediction. (2) Methods: we compare different assembly strategies for Haemophilus parasuis, whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(18 citation statements)
references
References 32 publications
2
16
0
Order By: Relevance
“…However, in some cases a relevant reference genome is not known. In this case, a de novo genome assembly is required, involving the construction of a complete nucleotide sequence without a reference [ 1 ]. This task is extremely difficult if using next-generation sequencing (NGS), because the typical lengths of reads are hundreds of nucleotides long [ 2 ].…”
Section: Introductionmentioning
confidence: 99%
“…However, in some cases a relevant reference genome is not known. In this case, a de novo genome assembly is required, involving the construction of a complete nucleotide sequence without a reference [ 1 ]. This task is extremely difficult if using next-generation sequencing (NGS), because the typical lengths of reads are hundreds of nucleotides long [ 2 ].…”
Section: Introductionmentioning
confidence: 99%
“…Short-read-first hybrid assemblies were generated using Unicycler v0.4.9b, which starts by building a short-read assembly graph with SPAdes v3.14.0, then uses the corresponding long reads (HAC basecalled, in this case) to scaffold the genome, and finally runs Pilon v1.23 [58] in an attempt to fill gaps, correct bases and fix misassemblies using the short reads [26, 54] (HAC basecalled reads were used for the short-read-first assemblies as these were generated before the SUP basecalling model became available [50]). To generate long-read-first assemblies [59], we used Flye v2.8-1 to produce ONT-only assemblies for the set of reads basecalled with the HAC and SUP-accuracy basecalling models, followed by long-read polishing with Medaka [30] to repair any residual errors using ONT long reads, then finally short-read polishing using Illumina reads and Pilon v1.24 (following the recommendations noted at [60]). Thus, altogether, we produced ten assemblies per sample, encompassing reads derived from the three separate basecalling models ( Figure 1 ).…”
Section: Methodsmentioning
confidence: 99%
“…Thanks to several machine-learning algorithms, these errors have been significantly reduced by read-based (e.g., Medaka) or reference-based (e.g., Homopolish) polishing methods [5]. These algorithmic advances have produced high-quality ONT genomes sufficient for downstream analysis (e.g., >Q50) [3, 6].…”
Section: Introductionmentioning
confidence: 99%