Viral Quasispecies Assembly via Maximal Clique Enumeration

Töpfer, Armin; Marschall, Tobias; Bull, Rowena A.; Luciani, Fabio; Schönhuth, Alexander; Beerenwinkel, Niko

doi:10.1007/978-3-319-05269-4_25

Cited by 20 publications

(33 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this error correction step, approximate suffix-prefix overlaps are computed to establish an initial read overlap graph. Inspired by Baaijens et al (2017) and Töpfer et al (2014), maximal cliques are enumerated in the non-oriented graph and errors are corrected by inspecting the read overlaps within the cliques. By design of the overlap graph-edges indicate that two reads stem from identical haplotypes-every clique only contains reads from identical haplotypes, which allows to eliminate errors based on majority votes.…”

Section: Methodsmentioning

confidence: 99%

“…In terms of assembly paradigms, POLYTE is an overlap graph based approach. It adopts ideas from earlier work that either focused on variant discovery (Marschall et al, 2012), viral quasispecies assembly (Baaijens et al, 2017;Töpfer et al, 2014) or metagenome gene assembly (Gregor et al, 2016) and unites the virtues of Marschall et al (2012)the ability to handle low coverage-on the one hand, and Baaijens et al (2017); Töpfer et al (2014)-dealing with real overlap graphs and contig computation-on the other hand. That is, POLYTE brings forth an iterative overlap graph based scheme for contig generation that reliably works in low coverage settings, requiring coverage of only as low as 5x per haplotype.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Overlap graph-based generation of haplotigs for diploids and polyploids

Baaijens

Schönhuth

2019

Bioinformatics

Self Cite

View full text Add to dashboard Cite

Motivation Haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity. Results We present POLYploid genome fitTEr (POLYTE) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings. Availability and implementation POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Overlap graph-based generation of haplotigs for diploids and polyploids

Baaijens

Schönhuth

2019

Bioinformatics

Self Cite

View full text Add to dashboard Cite

show abstract

“…The real composition of viral populations demands also new classification approaches to group components of mutant spectra (either from one isolate, from sequential isolates from one infected host, or from different hosts). Computational methods to organize and interpret the increasing numbers of minority variants being discovered in viral quasispecies have been developed (Prosperi et al, 2011;Poh et al, 2013;Gregori et al, 2014;Mangul et al, 2014;Topfer et al, 2014; for review see Marz et al, 2014). PAQ groups those viral sequences that are separated by the shortest genetic distances.…”

Section: Viral Quasispeciesmentioning

confidence: 99%

Darwinian Principles Acting on Highly Mutable Viruses

Domingo

2016

Virus as Populations

View full text Add to dashboard Cite

“…They often rely on the availability of closely related reference genomes of the studied species (Ahn et al, 2015;Tö pfer et al, 2014;Zagordi et al, 2011), where reads are first mapped onto a reference genome, using a read mapping tool, e.g. BWA (Li and Durbin, 2009), strain variants are then identified through a reference guided strain aware assembly.…”

Section: Introductionmentioning

confidence: 99%

“…In this line, there has been recent evidence that shorter genomes can be assembled through overlap graph based approaches, which make use of full-length reads, using short reads (Simpson and Durbin, 2012). It was also shown that one can perform strain aware assembly through iterative construction of overlap graphs (Tö pfer et al, 2014). For gene assembly from metagenomic data, the SAT assembler (Zhang et al, 2014) can be employed.…”

Section: Introductionmentioning

confidence: 99%

Snowball: strain aware gene assembly of metagenomes

2016

View full text Add to dashboard Cite

Motivation: Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied species to be available. Results: We have developed Snowball, a novel strain aware gene assembler for shotgun metagenomic data that does not require closely related reference genomes to be available. It uses profile hidden Markov models (HMMs) of gene domains of interest to guide the assembly. Our assembler performs gene assembly of individual gene domains based on read overlaps and error correction using read quality scores at the same time, which results in very low per-base error rates. Availability and Implementation: The software runs on a user-defined number of processor cores in parallel, runs on a standard laptop and is available under the GPL 3.0 license for installation under Linux or OS X at https://github.com/hzi-bifo/snowball.

show abstract

Viral Quasispecies Assembly via Maximal Clique Enumeration

Cited by 20 publications

References 44 publications

Overlap graph-based generation of haplotigs for diploids and polyploids

Overlap graph-based generation of haplotigs for diploids and polyploids

Darwinian Principles Acting on Highly Mutable Viruses

Snowball: strain aware gene assembly of metagenomes

Contact Info

Product

Resources

About