2019
DOI: 10.1186/s13015-019-0147-6
|View full text |Cite
|
Sign up to set email alerts
|

Linear time minimum segmentation enables scalable founder reconstruction

Abstract: Background We study a preprocessing routine relevant in pan-genomic analyses: consider a set of aligned haplotype sequences of complete human chromosomes. Due to the enormous size of such data, one would like to represent this input set with a few founder sequences that retain as well as possible the contiguities of the original sequences. Such a smaller set gives a scalable way to exploit pan-genomic information in further analyses (e.g. read alignment and variant calli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
31
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

3
6

Authors

Journals

citations
Cited by 14 publications
(31 citation statements)
references
References 22 publications
0
31
0
Order By: Relevance
“…Such an optimization approach might improve coverage (and therefore accuracy) while removing the random element. This might be accomplished using unsupervised, sequence-driven clustering methods [34,35], using the "founder sequence" framework [36,37], or using some form of submodular optimization [38]. A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing [31] and efficient indexing for repetitive texts [39].…”
Section: Discussionmentioning
confidence: 99%
“…Such an optimization approach might improve coverage (and therefore accuracy) while removing the random element. This might be accomplished using unsupervised, sequence-driven clustering methods [34,35], using the "founder sequence" framework [36,37], or using some form of submodular optimization [38]. A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing [31] and efficient indexing for repetitive texts [39].…”
Section: Discussionmentioning
confidence: 99%
“…Our approach, building from local MSAs and only collapsing haplotypes when they agree for a fixed number of bases, preserves more haplotype structure and avoids combinatorial explosion. Another alternative approach was recently taken by Norri et al [ 51 ], inferring a set of pseudo founder genomes from which to build the graph.…”
Section: Discussionmentioning
confidence: 99%
“…Such an optimization approach might improve coverage (and therefore accuracy) while removing the random element. This might be accomplished using unsupervised, sequence-driven clustering methods 36,37 , using the "founder sequence" framework 38,39 , or using some form of submodular optimization 40 . A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing 33 and efficient indexing for repetitive texts 41 .…”
Section: Discussionmentioning
confidence: 99%