2017
DOI: 10.1186/s12859-017-1610-3
|View full text |Cite
|
Sign up to set email alerts
|

HALC: High throughput algorithm for long read error correction

Abstract: BackgroundThe third generation PacBio SMRT long reads can effectively address the read length issue of the second generation sequencing technology, but contain approximately 15% sequencing errors. Several error correction algorithms have been designed to efficiently reduce the error rate to 1%, but they discard large amounts of uncorrected bases and thus lead to low throughput. This loss of bases could limit the completeness of downstream assemblies and the accuracy of analysis.ResultsHere, we introduce HALC, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
45
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 68 publications
(48 citation statements)
references
References 38 publications
1
45
0
Order By: Relevance
“…ELECTOR sample output As previously mentioned, ELECTOR computes general metrics: recall, precision, error rate, among other metrics, and provides a graphic representation of their distributions. A subset of the metrics produced by ELECTOR using reads corrected by the following tools: HALC (19), HG-CoLoR (20), LoRDEC (21), Canu (17), Daccord (Tischler, G., & Myers, E. W. (2017). Non hybrid long read consensus using local de Bruijn graph assembly.…”
Section: Validation On Synthetic Datasetsmentioning
confidence: 99%
“…ELECTOR sample output As previously mentioned, ELECTOR computes general metrics: recall, precision, error rate, among other metrics, and provides a graphic representation of their distributions. A subset of the metrics produced by ELECTOR using reads corrected by the following tools: HALC (19), HG-CoLoR (20), LoRDEC (21), Canu (17), Daccord (Tischler, G., & Myers, E. W. (2017). Non hybrid long read consensus using local de Bruijn graph assembly.…”
Section: Validation On Synthetic Datasetsmentioning
confidence: 99%
“…Given the distinct characteristics of long reads, i.e., significantly higher error rates and lengths, specialized algorithms are needed to correct them. Till date, several error correction tools for long reads have been developed including PacBioToCA [17], LSC [18], ECTools [19], LoRDEC [20], proovread [21], NaS [22], Nanocorr [23], Jabba [24], CoLoRMap [25], LoRMA [26], HALC [27], FLAS [28], FMLRC [29], HG-CoLoR [30] and Hercules [31].…”
Section: Introductionmentioning
confidence: 99%
“…This approach is implemented in tools such as Proovread (Hackl et al ., ) and/or LSC (Au et al ., ). As it is difficult to unambiguously align individual short reads against long reads, an alternative strategy involves an initial assembly of the short, accurate reads into contigs (HALC, Bao and Lan, ) or assembly graphs (LoRDEC, Salmela and Rivals, 2014) to correct the long reads. A recent comparison of long‐read correction tools found that HALC performed best on data sets from ‘complex’ genomes, such as that of humans or rice (Mahmoud et al ., ).…”
Section: Introductionmentioning
confidence: 99%