2021
DOI: 10.1093/bib/bbab102
|View full text |Cite
|
Sign up to set email alerts
|

Choice of assemblers has a critical impact on de novo assembly of SARS-CoV-2 genome and characterizing variants

Abstract: Background Coronavirus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global pandemic following its initial emergence in China. SARS-CoV-2 has a positive-sense single-stranded RNA virus genome of around 30Kb. Using next-generation sequencing technologies, a large number of SARS-CoV-2 genomes are being sequenced at an unprecedented rate and being deposited in public repositories. For the de novo assembly of the SARS-CoV-2 genomes, … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 29 publications
0
7
0
Order By: Relevance
“…The choice of assembler is significant too. MEGAHIT normally outperforms other assemblers in speed [ 29 ]. Natively, Captus can handle the tasks following user-specified arguments of threads and parallelization, which is not the case for HybPiper and SECAPR.…”
Section: Discussionmentioning
confidence: 99%
“…The choice of assembler is significant too. MEGAHIT normally outperforms other assemblers in speed [ 29 ]. Natively, Captus can handle the tasks following user-specified arguments of threads and parallelization, which is not the case for HybPiper and SECAPR.…”
Section: Discussionmentioning
confidence: 99%
“…A previous study reported the effects of different genome assemblies built from the same data on gene annotation and SNP annotations, which showed the importance of a high-quality genome assembly and consensus in the data inputs for downstream analysis (Florea et al, 2011). Another study on acute respiratory syndrome coronavirus 2 (SARS-CoV-2) also indicated that the choice of assemblers plays a significant role in the detection of variants and a number of variants present in assemblies are unique to the assembly methods (Islam et al, 2021). Hence, as sequencing methods are diversifying and increasingly new assembly algorithms are being developed, future genome downstream analysis such as detection of SVs should take the same data input into consideration to eliminate differences resulting from the data and assembly methods.…”
Section: Discussionmentioning
confidence: 99%
“…where π(•) is a row vector and ∑ π(•) = 1. We explore how many mutation steps will converge by computing the information entropy H(x) with Equations ( 8) and (9).…”
Section: Proof Of Convergence Interval Of the Mutation Simulation Modelmentioning
confidence: 99%
“…In terms of SARS-CoV-2 intra-host mutation spectra computation, although recent studies already analyzed the intra-host mutation spectra of SARS-CoV-2 [7,8], they neither employed dynamics thresholds to filter low-quality data during raw sequencing data processing nor considered data specificity. Furthermore, since most of these computational methods are based on next-generation sequencing data, which is inherently deficient in short read-length and the need of amplification, thus it is difficult to accurately distinguish positive-and negative-sense sub-genomes or to further investigate the strand specificity of SARS-CoV-2 intra-host mutants [5,9]. Therefore, our first scientific question is how to develop such a data specificity-based base filtering algorithm for SARS-CoV-2 sequencing data that can significantly improve the accuracy of strand-specific mutation spectra computation for SARS-CoV-2.…”
Section: Introductionmentioning
confidence: 99%