Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2012
DOI: 10.1145/2382936.2382984
|View full text |Cite
|
Sign up to set email alerts
|

Scalable genome scaffolding using integer linear programming

Abstract: The rapidly diminishing cost of genome sequencing is driving renewed interest in large scale genome sequencing programs such as Genome 10K (G10K). Despite renewed interest the assembly of large genomes from short reads is still an extremely resource intensive process. This work presents a scalable algorithms to create scaffolds, or ordered and oriented sets of assembled contigs, which is one part of a practical assembly. This is accomplished using integer linear programming (ILP). In order to process large mam… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 33 publications
0
7
0
Order By: Relevance
“…Note that in the ordering O the estimation of gaps between adjacent contigs uniquely defines the gap g O (i, j) between any pair of connected contigs i and j. SILP2 [3] extracts the maximum subgraph-ordering out of G . Instead, SILP3 is aimed to find an ordering with the maximum support of G -edges.…”
Section: Maximum Likelihood Orderingmentioning
confidence: 99%
See 2 more Smart Citations
“…Note that in the ordering O the estimation of gaps between adjacent contigs uniquely defines the gap g O (i, j) between any pair of connected contigs i and j. SILP2 [3] extracts the maximum subgraph-ordering out of G . Instead, SILP3 is aimed to find an ordering with the maximum support of G -edges.…”
Section: Maximum Likelihood Orderingmentioning
confidence: 99%
“…The flow of SILP2 [3] consists of the following steps: 1) mapping reads onto contigs 2) scaffolding graph construction 3) maximum likelihood contig orientation via ILP 4) decomposition into paths of orientation compatible edges via bipartite matching 5) maximum likelihood gap estimation II. MAXIMUM LIKELIHOOD ORIENTATION Let G = (V, E) be the scaffolding graph connecting vertices-contigs with edges-read pairs.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Based on the scaffold assembly formulation from Huson et al [21], this study proposes extensions to an exact algorithm [13] that make it feasible for scaffolding large, repeat-rich genomes in a time and memory-efficient manner [22,23]. The new approach, termed OPERA-LG, was extensively evaluated against state-of-the-art scaffolders (SSPACE [14], SOPRA [16] and BESST [24]) and assembly pipelines (SOAPdenovo [25] and ALLPATHS-LG [26]) on simulated and real datasets.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to being the first, scalable scaffolding and assembly algorithm 1 with proven performance guarantees (other recent works on scaffolding are either reported to be not scalable or do not have formal guarantees (Dayarian et al 2010;Salmela et al 2011;Lindsay et al 2012)), OPERA-LG incorporates several additional features specifically tailored for producing high quality draft assemblies for large, repeat-rich genomes. These include the ability to simultaneously use data from multiple libraries (a first for stand-alone scaffolding programs and originally implemented in the Celera Assembler at the mate-pair level (Myers et al 2000)) as is typically needed in large assembly projects, an improved edge-length estimation algorithm and an exact extension for scaffolding the repetitive sequences that typically confound assembly tools.…”
Section: Introductionmentioning
confidence: 99%