SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21) 2021
DOI: 10.1137/1.9781611976830.12
|View full text |Cite
|
Sign up to set email alerts
|

BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper

Abstract: Recent advances in long-read sequencing allow characterization of genome structure and its variation within and between species at a resolution not previously possible. Detection of overlap between reads is an essential component of many long read genome pipelines, such as de novo genome assembly. Longer reads simplify genome assembly and improve reconstruction contiguity, but current long read technologies are associated with moderate to high error rates.In this work, we present Berkeley Efficient Long-Read t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…It uses cardinality estimation methods highly specialized for set union and set intersection. BELLA [40] is an overlap detection and alignment algorithm that defines genome similarity calculations using sparse matrix-matrix multiplication. Similarly, GenomeAtScale [10], the SOTA algorithm, is a matrix-matrix multiplication approach to genome similarity that allows for parallel data compression and batch computation.…”
Section: Related Workmentioning
confidence: 99%
“…It uses cardinality estimation methods highly specialized for set union and set intersection. BELLA [40] is an overlap detection and alignment algorithm that defines genome similarity calculations using sparse matrix-matrix multiplication. Similarly, GenomeAtScale [10], the SOTA algorithm, is a matrix-matrix multiplication approach to genome similarity that allows for parallel data compression and batch computation.…”
Section: Related Workmentioning
confidence: 99%
“…The range is adjusted as the algorithm proceeds, attempting to follow the highest score. The best way to do this is unclear: several ways have been suggested [7,[25][26][27][28].…”
Section: Review Of Standard Alignmentmentioning
confidence: 99%
“…The rationale for choosing a region with least 'N' content is simple-to improve the mapping quality (recall, in particular) for long reads that have repetitive content. We use ℓ = 2000 bp as our segment length for L2L, based on prior works that have suggested similar lengths [8,37].…”
Section: Additional Implementation Detailsmentioning
confidence: 99%
“…However, this approach achieves a very low F1 score [8]. So, the balance between precision and recall is not maintained for complex genomes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation