2022
DOI: 10.1089/cmb.2022.0266
|View full text |Cite
|
Sign up to set email alerts
|

Algorithms for Colinear Chaining with Overlaps and Gap Costs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
34
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(34 citation statements)
references
References 26 publications
0
34
0
Order By: Relevance
“…We only use k-mer seeds in this study, although other types of seeds are possible (Keich et al, 2004; Kiełbasa et al, 2011). An optimal increasing subsequence of possibly overlapping anchors based on some score is then collected into a chain , where increasing is defined with the standard precedence relationship (Jain et al, 2022) between k-mer anchors (See Figure 5a and Chaining below). The chain is extended into a full alignment by aligning between anchor gaps in the chain.…”
Section: Resultsmentioning
confidence: 99%
“…We only use k-mer seeds in this study, although other types of seeds are possible (Keich et al, 2004; Kiełbasa et al, 2011). An optimal increasing subsequence of possibly overlapping anchors based on some score is then collected into a chain , where increasing is defined with the standard precedence relationship (Jain et al, 2022) between k-mer anchors (See Figure 5a and Chaining below). The chain is extended into a full alignment by aligning between anchor gaps in the chain.…”
Section: Resultsmentioning
confidence: 99%
“…Co-linear chaining is a mathematically rigorous approach to do clustering of anchors. It is well studied for the case of sequence-to-sequence alignment [1,11,12,16,26,28,32], and is widely used in present-day long read to reference sequence aligners [18,21,35,37,38]. For the sequence-to-sequence alignment case, the input to the chaining problem is a set of N weighted anchors where each anchor is a pair of intervals in the two sequences that match exactly.…”
Section: Introductionmentioning
confidence: 99%
“…However, the problem formulations in these works did not include gap cost. Without penalizing gaps, chaining is less effective [16]. A challenge in enforcing gap cost is that measuring gap between two loci in a DAG is not a simple arithmetic operation like in a sequence [20].…”
Section: Introductionmentioning
confidence: 99%
“…For long reads, there has been a recent breakthrough by sampling and indexing only a relatively small number of short potential seeds from the reference genome, which has led to faster and more accurate mapping tools, e.g., [24] and [16]. Chaining consists of finding maximal subsets of seeds that all agree on a certain genomic location [14]; seeds often have spurious matches due to their short lengths.…”
Section: Introductionmentioning
confidence: 99%
“…We also show for the first time that indexing only the long minimizer-space seeds ( k -min-mers) that occur uniquely in the genome is sufficient for sensitive and specific mapping. Another major conceptual advance is that by leveraging the high specificity of these seeds, we can devise a provably 𝒪( n ) time (heuristic) pseudo-chaining algorithm, which improves upon the subsequent best 𝒪( n log n ) runtime of all other known colinear chaining methods [14], without loss of performance in practice. We further study why simply using longer k -mers would be suboptimal.…”
Section: Introductionmentioning
confidence: 99%