2020
DOI: 10.1093/bioinformatics/btaa640
|View full text |Cite
|
Sign up to set email alerts
|

Efficient dynamic variation graphs

Abstract: Motivation Pangenomics is a growing field within computational genomics. Many pangenomic analyses use bidirected sequence graphs as their core data model. However, implementing and correctly using this data model can be difficult, and the scale of pangenomic data sets can be challenging to work at. These challenges have impeded progress in this field. Results Here we present a stack of two C ++ libraries, libbdsg and libhandl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 28 publications
(27 citation statements)
references
References 7 publications
0
27
0
Order By: Relevance
“…Following the generation of alignments with CACTUS, we used a custom pipeline to detect nodes that were not present in the Hereford genome, ARS-UCD1.2, considered as the reference genome. We first used a custom python script and the libbdsg 54 library to extract the nodes not present in any Hereford paths. These nodes have then been screened for Nmers, and then misassembled regions detected by FRC_Align 30 on the two de novo assemblies here presented were discarded.…”
Section: Genome Alignment and Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…Following the generation of alignments with CACTUS, we used a custom pipeline to detect nodes that were not present in the Hereford genome, ARS-UCD1.2, considered as the reference genome. We first used a custom python script and the libbdsg 54 library to extract the nodes not present in any Hereford paths. These nodes have then been screened for Nmers, and then misassembled regions detected by FRC_Align 30 on the two de novo assemblies here presented were discarded.…”
Section: Genome Alignment and Comparisonmentioning
confidence: 99%
“…We generated a linear expanded genome with the purpose of providing an easy to use, expanded version of the cattle reference genome that is also easy to implement in current best practice pipelines. We extracted all nodes not present in the linear Hereford genome, but that were found in the other 4 assemblies considered using libbdsg (v0.3) 54 . Nodes were then labelled based on the genome in which they were found (i.e.…”
Section: Linear Expanded Genomementioning
confidence: 99%
“…3.Their embedded paths are locally similar to each other. These properties are used to build efficient dynamic variation graph data structures (Siren et al, 2020;Eizenga et al, 2020a). Sparsity (1) allows us to encode edges E using adjacency lists rather than matrices or hash tables.…”
Section: Methodsmentioning
confidence: 99%
“…This is not always identical to the original sequence graph, as some nodes and edges may not be visited by any haplotype. In order to support the handle graph interface 43 , we need some additional structures:…”
Section: /31mentioning
confidence: 99%