2 3 4 5 Power and pitfalls of computational methods for inferring clone phylogenies and mutation 6 orders from bulk sequencing data 7 8 9 Abstract 29Background. Tumors harbor extensive genetic heterogeneity in the form of distinct clone 30 genotypes that arise over time and across different tissues and regions of a cancer patient. Many 31 computational methods produce clone phylogenies from population bulk sequencing data 32 collected from multiple tumor samples. These clone phylogenies are used to infer mutation order 33 and clone origin times during tumor progression, rendering the selection of the appropriate clonal 34 deconvolution method quite critical. Surprisingly, absolute and relative accuracies of these 35 methods in correctly inferring clone phylogenies have not been consistently assessed. 36Methods. We evaluated the performance of seven computational methods in producing clone 37 phylogenies for simulated datasets in which clones were sampled from multiple sectors of a 38 primary tumor (multi-region) or primary and metastatic tumors in a patient (multi-site). We 39 assessed the accuracy of tested methods metrics in determining the order of mutations and the 40 branching pattern within the reconstructed clone phylogenies. 41Results. The accuracy of the reconstructed mutation order varied extensively among methods 42 (9% -44% error). Methods also varied significantly in reconstructing the topologies of clone 43 phylogenies, as 24% -58% of the inferred clone groupings were incorrect. All the tested methods 44 showed limited ability to identify ancestral clone sequences present in tumor samples correctly. 45The occurrence of multiple seeding events among tumor sites during metastatic tumor evolution 46 hindered deconvolution of clones for all tested methods. 47
Conclusions.Overall, CloneFinder, MACHINA, and LICHeE showed the highest overall 48 accuracy, but none of the methods performed well for all simulated datasets and conditions. 49 50 Background 52Somatic mutations play a crucial role in cancer progression [1][2][3]. Early models proposed that 53 clones with driver mutations sweep through the population, which is called a linear progression of 54 clone evolution [4]. Now, it is clear that tumors are not monoclonal, and that the clonal evolution 55generally follows a branching model (i.e., incomplete clonal sweep) even within a tumor [4][5][6][7][8][9][10]. 56Similarly, metastatic tumors also follow a branching pattern [11, 12]. Clones found in primary and 57 metastatic tumors show inter-and intra-tumor evolutionary relationships, which can be 58represented by a single-patient clone phylogeny [13-16] (e.g., Fig. 1g and 1h). The reconstruction 59 and analysis of clone phylogenies have become standard practices in cancer genomics [16][17][18][19][20][21][22][23][24][25][26]. 60Clone phylogenies are most often inferred using bulk sequencing data [16,[27][28][29][30]. Bulk 61 sequencing of tumor samples is cost effective and can accurately identify single nucleotide 62 variants (SNVs) [31, 32]. The result...