Distance Indexing and Seed Clustering in Sequence Graphs

Chang, Xian; Eizenga, Jordan M.; Novak, Adam M.; Sirén, Jouni; Paten, Benedict

doi:10.1101/2019.12.20.884924

Cited by 2 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The clustering algorithm in vg mpmap is built around the distance index described in [71]. In brief, this index can query the minimum distance between two positions in the pangenome graph by expressing the distance as the sum of a small number of precomputed distances.…”

Section: Methodsmentioning

confidence: 99%

“…Edges that are much longer than the read length are not added; this avoids treating distal elements on the same chromosome as part of the same cluster. In addition, we accelerate this process using Algorithm 3 from [71], which partitions seeds into equivalence classes based on the distance between them. The equivalence relation is the transitive closure of the relation of being connected by a path of length less than d , which is a tunable parameter.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Sibbesen

Eizenga

Novak

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Pangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our novel toolchain can construct spliced pangenome graphs, map RNA-seq data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. This workflow improves accuracy over state-of-the-art RNA-seq mapping methods, and it can efficiently quantify haplotype-specific transcript expression without needing to characterize a sample's haplotypes beforehand.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Sibbesen

Eizenga

Novak

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Given the additional complexities and overheads of processing a genome graph instead of a linear reference genome, graphbased analysis exacerbates the bottlenecks of read-to-reference mapping. Due to the nascent nature of sequence-to-graph mapping, a much smaller number of software tools (and no hardware accelerators) exist for sequence-to-graph mapping [36,54,61,66,[77][78][79][80][81][82][83] compared to the traditional sequence-to-sequence mapping.…”

Section: Introductionmentioning

confidence: 99%

SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

Cali,

Kanellopoulos,

Lindegger

et al. 2022

Preprint

View full text Add to dashboard Cite

A critical step of genome sequence analysis is the mapping of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-tosequence mapping). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in a population. Mapping reads to the graph-based reference genome (i.e., sequence-to-graph mapping) results in notable quality improvements in genome analysis. Unfortunately, while sequence-to-sequence mapping is well studied with many available tools and accelerators, sequence-to-graph mapping is a more difficult computational problem, with a much smaller number of practical software tools currently available.We analyze two state-of-the-art sequence-to-graph mapping tools and reveal four key issues. We find that there is a pressing need to have a specialized, high-performance, scalable, and low-cost algorithm/hardware co-design that alleviates bottlenecks in both the seeding and alignment steps of sequence-to-graph mapping. Since sequence-to-sequence mapping can be treated as a special case of sequence-to-graph mapping, we aim to design an accelerator that is efficient for both linear and graph-based read mapping.To this end, we propose SeGraM, a universal algorithm/hardware co-designed genomic mapping accelerator that can effectively and efficiently support both sequence-to-graph mapping and sequenceto-sequence mapping, for both short and long reads. To our knowledge, SeGraM is the first algorithm/hardware co-design for accelerating sequence-to-graph mapping. SeGraM consists of two main components: (1) MinSeed, the first minimizer-based seeding accelerator, which finds the candidate locations in a given genome graph; and (2) BitAlign, the first bitvector-based sequence-to-graph alignment accelerator, which performs alignment between a given read and the subgraph identified by MinSeed. We couple SeGraM with high-bandwidth memory to exploit low latency and highlyparallel memory access, which alleviates the memory bottleneck.

show abstract

Distance Indexing and Seed Clustering in Sequence Graphs

Cited by 2 publications

References 18 publications

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

Contact Info

Product

Resources

About