2019
DOI: 10.1093/bioinformatics/btz400
|View full text |Cite
|
Sign up to set email alerts
|

ntEdit: scalable genome sequence polishing

Abstract: Motivation In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes. Results … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
79
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 77 publications
(80 citation statements)
references
References 13 publications
(14 reference statements)
1
79
0
Order By: Relevance
“…Using the polished output from Pilon, we repeated the short read polishing two additional times and observed moderate improvements in BUSCO scores. Finally, due to the low Illumina sequencing coverage, we employed an additional polishing step utilizing ntEdit [27], which functions well in low sequence coverage situations. We observed a slight improvement recovering an additional 81 complete BUSCOs, ultimately obtaining 90% BUSCO completeness with ~5% listed as fragmented or missing, respectively.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Using the polished output from Pilon, we repeated the short read polishing two additional times and observed moderate improvements in BUSCO scores. Finally, due to the low Illumina sequencing coverage, we employed an additional polishing step utilizing ntEdit [27], which functions well in low sequence coverage situations. We observed a slight improvement recovering an additional 81 complete BUSCOs, ultimately obtaining 90% BUSCO completeness with ~5% listed as fragmented or missing, respectively.…”
Section: Resultsmentioning
confidence: 99%
“…Left indicates the number of rounds of each program, and bars display BUSCO notation. (D) ntEdit [27] was performed using the eHAP1 short reads on the 5x Racon/3x Pilon (eHAP1 Only) polished assembly and using eHAP1 then HAP1 (eHAP1-HAP1) or HAP1 then eHAP1 (HAP1-eHAP1) short reads. BUSCO sores were calculated after each round.…”
Section: Data Accessionmentioning
confidence: 99%
See 1 more Smart Citation
“…It can be applied during and after the genome assembly. Usually, 21 long-read assemblers perform a single round of long-read polishing [14,16,17], that is followed by 22 several rounds of polishing with long [15,17,19,21] and short [15,20,22] reads using third-party 23 tools [15,17,[19][20][21][22]. 24 Currently, polishing large genomes, such as the human genome, can take much more com-25 putational time than the long-read assembly itself [14,16,17].…”
mentioning
confidence: 99%
“…This new assembly approach gave rise to some criticisms because even after 1 several rounds of polishing, a substantial fraction of consensus errors remains, hampering the sub-2 sequent genome analyses such as gene and protein prediction [23]. 3 When the aforementioned assembly approach employs short-read polishing [15,20,22], then 4 it corresponds to a long-read-first hybrid assembly strategy [24,25]. Another hybrid assembly 5 strategy consists in starting the assembly process with short reads [26].…”
mentioning
confidence: 99%