2016
DOI: 10.1101/057414
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

tHapMix: simulating tumour samples through haplotype mixtures

Abstract: Motivation: Large-scale rearrangements and copy number changes combined with different modes of clonal evolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable variant calling tools and create well-calibrated benchmarks. Results: We developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real r… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 10 publications
0
3
0
Order By: Relevance
“…We compared HATCHet with six current stateof-the-art methods for copy-number deconvolution, i.e., Battenberg 9 , TITAN 17 , THetA 21,22 , cloneHD 25 , Canopy 37 (with fractional copy numbers from FALCON 15 ), and ReMixT 27 , on simulated data. Most current studies that simulate DNA sequencing data from mixed samples containing CNAs do not account for the different genome lengths of distinct clones [15][16][17]25,[39][40][41][42][43][44] ; this oversight leads to incorrect simulation of read counts (Supplementary Figs. 4 and 5).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We compared HATCHet with six current stateof-the-art methods for copy-number deconvolution, i.e., Battenberg 9 , TITAN 17 , THetA 21,22 , cloneHD 25 , Canopy 37 (with fractional copy numbers from FALCON 15 ), and ReMixT 27 , on simulated data. Most current studies that simulate DNA sequencing data from mixed samples containing CNAs do not account for the different genome lengths of distinct clones [15][16][17]25,[39][40][41][42][43][44] ; this oversight leads to incorrect simulation of read counts (Supplementary Figs. 4 and 5).…”
Section: Resultsmentioning
confidence: 99%
“…Assuming that reads are uniformly sequenced along the genome and across all cells, what is the expected proportion v i of reads that originated from clone i? Most current studies [15][16][17]25,[39][40][41][42][43][44] that simulate sequencing reads from mixed samples compute v i as a function of u i without taking into account the corresponding genome length L i . For example, Ha et al 17 and Adalsteinsson et al 39 artificially form a mixed sample of two clones by mixing reads from two other given samples in proportions…”
Section: Methodsmentioning
confidence: 99%
“…To accomplish this we have created a number of pedigree sequencing samples using Platinum Genomes (PG) dataset (Eberle et al, 2016) that include: (1) normal trios, (2) negative control replicates of a single sample, (3) de novo enriched trio and quad where parents are derived from the same sample and (4) pedigree simulation through haplotype down-sampling. The latter is an adaptation of the previously described tHapMix simulation framework for somatic variants (Ivakhno et al, 2016) to germline CNVs within a pedigree relationship structure. Truth sets were generated by merging structural variants found in PG data using orthogonal variant calling tools.…”
Section: Resultsmentioning
confidence: 99%