Abstract:A deep learning approach refines the state-of-the-art subtypes of colorectal cancer and evaluates the fidelity of cell lines that model cancer.
“…Cell lines are commonly used as models for tumors; however, it is an open question how to best apply the available cell line panels to learn about cancer biology. The availability of genomic data from large tumor cohorts and from cell line panels has spurred efforts to find which cell lines are closer to tumors by their transcriptomic (9,37,38) and/or genomic features (12,13), presumably making better models, and which are more distant from examples of actual tumors, thus making less good models.…”
Cell lines are commonly used as cancer models. The tissue of origin provides context for understanding biological mechanisms and predicting therapy response. We therefore systematically examined whether cancer cell lines exhibit features matching the presumed cancer type of origin. Gene expression and DNA methylation classifiers trained on ~9000 tumors identified 35 (of 614 examined) cell lines that better matched a different tissue or cell type than the one originally assigned. Mutational patterns further supported most reassignments. For instance, cell lines identified as originating from the skin often exhibited a UV mutational signature. We cataloged 366 “golden set” cell lines in which transcriptomic and epigenomic profiles strongly resemble the cancer type of origin, further proposing their assignments to subtypes. Accounting for the uncertain tissue of origin in cell line panels can change the interpretation of drug screening and genetic screening data, revealing previously unknown genomic determinants of sensitivity or resistance.
“…Cell lines are commonly used as models for tumors; however, it is an open question how to best apply the available cell line panels to learn about cancer biology. The availability of genomic data from large tumor cohorts and from cell line panels has spurred efforts to find which cell lines are closer to tumors by their transcriptomic (9,37,38) and/or genomic features (12,13), presumably making better models, and which are more distant from examples of actual tumors, thus making less good models.…”
Cell lines are commonly used as cancer models. The tissue of origin provides context for understanding biological mechanisms and predicting therapy response. We therefore systematically examined whether cancer cell lines exhibit features matching the presumed cancer type of origin. Gene expression and DNA methylation classifiers trained on ~9000 tumors identified 35 (of 614 examined) cell lines that better matched a different tissue or cell type than the one originally assigned. Mutational patterns further supported most reassignments. For instance, cell lines identified as originating from the skin often exhibited a UV mutational signature. We cataloged 366 “golden set” cell lines in which transcriptomic and epigenomic profiles strongly resemble the cancer type of origin, further proposing their assignments to subtypes. Accounting for the uncertain tissue of origin in cell line panels can change the interpretation of drug screening and genetic screening data, revealing previously unknown genomic determinants of sensitivity or resistance.
“…observed that OC316 was hyper-mutated 12 , Sinha et al found that SLR20 had an outlier copy number profile 57 , and Ronen et al found that COLO320 was dissimilar to colorectal tumors and lacked major colorectal cancer driver genes 58 . In our analysis, all of these cell lines were also identified as being unlike their respective tumor types.…”
Cell lines are key tools for preclinical cancer research, but it remains unclear how well they represent patient tumor samples. Identifying cell line models that best represent the features of particular tumor samples, as well as tumor types that lack in vitro model representation, remain important challenges. Gene expression has been shown to provide rich information that can be used to identify tumor subtypes, as well as predict the genetic dependencies and chemical vulnerabilities of cell lines. However, direct comparisons of tumor and cell line transcriptional profiles are complicated by systematic differences, such as the presence of immune and stromal cells in tumor samples and differences in the cancer-type composition of cell line and tumor expression datasets. To address these challenges, we developed an unsupervised alignment method (Celligner) and applied it to integrate several large-scale cell line and tumor RNA-Seq datasets. While our method aligns the majority of cell lines with tumor samples of the same cancer type, it also reveals large differences in tumor/cell line similarity across disease types. Furthermore, Celligner identifies a distinct group of several hundred cell lines from diverse lineages that present a more mesenchymal and undifferentiated transcriptional state and which exhibit distinct chemical and genetic dependencies. This method could thus be used to guide the selection of cell lines that more closely resemble patient tumors and improve the clinical translation of insights gained from cell line models.
“…For breast cancer, cell lines subtypes have been assigned mainly using Prediction Analysis for Microarrays (PAM) analysis, which is based on a restricted set of gene expression markers (38). For colorectal cancer, the cell lines were stratified into the consensus molecular subtypes (CMS) integrating transcriptomic and genomic data (39). For renal cancer, subtypes were assigned to the cell lines using gene expression data (14).…”
Section: Resultsmentioning
confidence: 99%
“…Cell lines are commonly used as models for tumors, however it is an open question how to best apply the available cell line panels to learn about cancer biology. The availability of genomic data from large tumor cohorts and from cell line panels has spurred multiple efforts to find which cell line(s) are closer to tumors by their transcriptomic (10,39,40,46) and/or genomic features (13,14), presumably making better models, and which are more distant from examples of actual tumors, presumably making less good models of tumor biology.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.