Phylogenetic techniques are increasingly applied to infer the somatic mutational history of a tumor from DNA sequencing data. However, standard phylogenetic tree reconstruction techniques do not account for the fact that bulk sequencing data measures mutations in a population of cells. We formulate and solve the multi-state perfect phylogeny mixture deconvolution problem of reconstructing a phylogenetic tree given mixtures of its leaves, under the multi-state perfect phylogeny, or infinite alleles model. Our somatic phylogeny reconstruction using combinatorial enumeration (SPRUCE) algorithm uses this model to construct phylogenetic trees jointly from single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). We show that SPRUCE addresses complexities in simultaneous analysis of SNVs and CNAs. In particular, there are often many possible phylogenetic trees consistent with the data, but the ambiguity decreases considerably with an increasing number of samples. These findings have implications for tumor sequencing strategies, suggest caution in drawing strong conclusions based on a single tree reconstruction, and explain difficulties faced by applying existing phylogenetic techniques to tumor sequencing data.
Metastasis is the migration of cancerous cells from a primary tumor to other anatomical sites. While metastasis was long thought to result from monoclonal seeding, or single cellular migrations, recent phylogenetic analyses of metastatic cancers have reported complex patterns of cellular migrations between sites, including polyclonal migrations and reseeding. However, accurate determination of migration patterns from somatic mutation data is complicated by intra-tumor heterogeneity and discordance between clonal lineage and cellular migration. We introduce MACHINA, a multi-objective optimization algorithm that jointly infers clonal lineages and parsimonious migration histories of metastatic cancers from DNA sequencing data. MACHINA analysis of data from multiple cancers reveals that migration patterns are often not uniquely determined from sequencing data alone, and that complicated migration patterns among primary tumors and metastases may be less prevalent than previously reported. MACHINA’s rigorous analysis of migration histories will aid in studies of the drivers of metastasis.
We describe an algorithm called THetA2 that infers the composition of a tumor sample-including not only tumor purity but also the number and content of tumor subpopulations-directly from both whole-genome (WGS) and whole-exome (WXS) high-throughput DNA sequencing data. This algorithm builds on our earlier Tumor Heterogeneity Analysis (THetA) algorithm in several important directions. These include improved ability to analyze highly rearranged genomes using a variety of data types: both WGS sequencing (including low ∼7× coverage) and WXS sequencing. We apply our improved THetA2 algorithm to WGS (including low-pass) and WXS sequence data from 18 samples from The Cancer Genome Atlas (TCGA). We find that the improved algorithm is substantially faster and identifies numerous tumor samples containing subclonal populations in the TCGA data, including in one highly rearranged sample for which other tumor purity estimation algorithms were unable to estimate tumor purity.
Highlights d Single-nucleotide variants (SNVs) and CNAs are markers of cancer evolution d Copy-number aberrations (CNAs) may overlap SNVs and result in SNV loss d Loss-supported model constrains SNV losses to loci with a decrease in copy number d SCARLET integrates SNVs and CNAs yielding more accurate single-cell phylogenies
MotivationA tumor arises from an evolutionary process that can be modeled as a phylogenetic tree. However, reconstructing this tree is challenging as most cancer sequencing uses bulk tumor tissue containing heterogeneous mixtures of cells.ResultsWe introduce Probabilistic Algorithm for Somatic Tree Inference (PASTRI), a new algorithm for bulk-tumor sequencing data that clusters somatic mutations into clones and infers a phylogenetic tree that describes the evolutionary history of the tumor. PASTRI uses an importance sampling algorithm that combines a probabilistic model of DNA sequencing data with a enumeration algorithm based on the combinatorial constraints defined by the underlying phylogenetic tree. As a result, tree inference is fast, accurate and robust to noise. We demonstrate on simulated data that PASTRI outperforms other cancer phylogeny algorithms in terms of runtime and accuracy. On real data from a chronic lymphocytic leukemia (CLL) patient, we show that a simple linear phylogeny better explains the data the complex branching phylogeny that was previously reported. PASTRI provides a robust approach for phylogenetic tree inference from mixed samples.Availability and ImplementationSoftware is available at compbio.cs.brown.edu/software.Supplementary information Supplementary data are available at Bioinformatics online.
Highlights d Longitudinal sequencing provides additional information for phylogeny inference d CALDER leverages longitudinal information to derive phylogeny from mixed samples d CALDER yields more accurate trees on simulated and real cancer data d Longitudinal model extendable to other data types such as single-cell sequencing
To investigate the genomic evolution of metastatic pediatric osteosarcoma, we performed whole-genome and targeted deep sequencing on 14 osteosarcoma metastases and two primary tumors from four patients (two to eight samples per patient). All four patients harbored ancestral (truncal) somatic variants resulting in TP53 inactivation and cell-cycle aberrations, followed by divergence into relapse-specific lineages exhibiting a cisplatin-induced mutation signature. In three of the four patients, the cisplatin signature accounted for >40% of mutations detected in the metastatic samples. Mutations potentially acquired during cisplatin treatment included NF1 missense mutations of uncertain significance in two patients and a KIT G565R activating mutation in one patient. Three of four patients demonstrated widespread ploidy differences between samples from the sample patient. Single-cell seeding of metastasis was detected in most metastatic samples. Crossseeding between metastatic sites was observed in one patient, whereas in another patient a minor clone from the primary tumor seeded both metastases analyzed. These results reveal extensive clonal heterogeneity in metastatic osteosarcoma, much of which is likely cisplatin-induced. Implications: The extent and consequences of chemotherapyinduced damage in pediatric cancers is unknown. We found that cisplatin treatment can potentially double the mutational burden in osteosarcoma, which has implications for optimizing therapy for recurrent, chemotherapy-resistant disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.