33 34 35 * These authors made equal contributions. 36 37 38 Correspondence to: patrick.cahan@jhmi.edu 39 40 Article type: Analysis 41 42 Website: http://www.cahanlab.org/resources/cancerCellNet_web 43 44 Code: https://github.com/pcahan1/cancerCellNet 45 46 47 48 49 2 ABSTRACT 50 51Cancer researchers use cell lines, patient derived xenografts, and genetically engineered mice 52 as models to investigate tumor biology and to identify therapies. The generalizability and power 53 of a model derives from the fidelity with which it represents the tumor type of investigation, 54 however, the extent to which this is true is often unclear. The preponderance of models and the 55 ability to readily generate new ones has created a demand for tools that can measure the extent 56 and ways in which cancer models resemble or diverge from native tumors. Here, we present a 57 computational tool, CancerCellNet, that measures the similarity of cancer models to 22 naturally 58 occurring tumor types and 36 subtypes, in a platform and species agnostic manner. We applied 59 this tool to 657 cancer cell lines, 415 patient derived xenografts, and 26 distinct genetically 60 engineered mouse models, documenting the most faithful models, identifying cancers 61 underserved by adequate models, and finding models with annotations that do not match their 62 classification. By comparing models across modalities, we find that genetically engineered mice 63 have higher transcriptional fidelity than patient derived xenografts and cell lines in four out of 64 five tumor types. We have made CancerCellNet available as freely downloadable software and 65 as a web application that can be applied to new cancer models. 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81Models are widely used to investigate cancer biology and to identify potential therapeutics.
82Popular modeling modalities are cancer cell lines (CCLs) 1 , genetically engineered mouse 83 models (GEMMs) 2 , and patient derived xenografts (PDXs) 3 .These classes of models differ in the 84 types of questions that they are designed to address. CCLs are often used to address cell 85 intrinsic mechanistic questions 4 , GEMMs to chart progression of molecularly defined-disease 5 , 86 and PDXs to explore patient-specific response to therapy in a physiologically relevant context 6 .
87Models also differ in the extent to which the they represent specific aspects of a cancer type 7 .
88Even with this intra-and inter-class model variation, all models should represent the tumor type 89 or subtype under investigation, and not another type of tumor, and not a non-cancerous tissue.
90Therefore, cancer-models should be selected not only based on the specific biological question 91 but also based on the similarity of the model to the cancer type under investigation 8,9 . 92 Various methods have been proposed to determine the similarity of cancer models to 93 their intended subjects. Domcke et al devised a 'suitability score' as a metric of the molecular 94 similarity of CCLs to high grade serous ovarian carcinom...