2022
DOI: 10.1186/s12859-021-04547-0
|View full text |Cite
|
Sign up to set email alerts
|

Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing

Abstract: Background Sequencing technologies are prone to errors, making error correction (EC) necessary for downstream applications. EC tools need to be manually configured for optimal performance. We find that the optimal parameters (e.g., k-mer size) are both tool- and dataset-dependent. Moreover, evaluating the performance (i.e., Alignment-rate or Gain) of a given tool usually relies on a reference genome, but quality reference genomes are not always available. We introduce Lerna for the automated co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 65 publications
(87 reference statements)
0
6
0
Order By: Relevance
“…The random forests hereby replace our hand-crafted conditions to decide whether a specific position in a read should be modified. This is in contrast to previous recent machine learning approaches like Athena [ 18 ] and Lerna [ 19 ] which try to find optimal input parameters for existing correction algorithms. Third, the algorithm has been optimized to reduce both runtime and memory consumption on both CPUs and GPUs.…”
Section: Introductionmentioning
confidence: 76%
“…The random forests hereby replace our hand-crafted conditions to decide whether a specific position in a read should be modified. This is in contrast to previous recent machine learning approaches like Athena [ 18 ] and Lerna [ 19 ] which try to find optimal input parameters for existing correction algorithms. Third, the algorithm has been optimized to reduce both runtime and memory consumption on both CPUs and GPUs.…”
Section: Introductionmentioning
confidence: 76%
“…Compared to k -mer-based approaches the performance is drastically lower. Naturally, with longer substrings, the vocabulay size increases allowing for more detailed pattern recognition ( 28 ).…”
Section: Resultsmentioning
confidence: 99%
“…Taxonomic classification tools are often based on alignment-free mapping approaches enabling the analysis of millions of reads with a high accuracy. Deep learning has shown its potential in a variety of problems on sequence-based data ( 12 , 13 , 28–30 ). Therefore, new classification approaches were explored using deep learning with good results on genus and species prediction even on non-curated databases ( 16 ).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This information and factors such as quality scores and genomic coverage contribute to the formulation of features used in training machine learning models. Furthermore, other machine learning methods specialize in identifying the optimal k -mer size essential for independent error correction tools [ 31 , 32 ] (see Table 1 ).…”
Section: Introductionmentioning
confidence: 99%