The ability to locate a tumor primary site for patients with a cancer of unknown primary (CUP) is a major obstacle in providing personalized therapeutic options and access to clinical trials. Despite the recent use of molecular-based tools to identify the tumor tissue-of-origin (TOO), overall survival for CUP patients remains low. Here, we present an AI-based tool that predicts the TOO by using genomic and transcriptomic data to classify CUP tumors into hierarchically-organized molecular subgroups. The TOO predictor was composed of DNA, RNA, and Consensus classifiers that were hierarchically organized with respect to molecular diagnosis with upper level clusters based on common molecular features reflecting similar cell of origin, and lower levels containing specific diagnoses for further classification. The ML-based DNA classifier was trained on a dataset of publicly available genomic data generated from 8,000 samples, and independently validated using more than 5,500 samples. The ML-based RNA classifier was trained on a dataset of publicly available transcriptomic data created from more than 10,100 samples with tumor- and normal-specific features for each cancer type, and independently validated using 20,000 samples. The Consensus classifier, combining outputs from both DNA and RNA algorithms, was trained on a dataset of genomic and transcriptomic data from 1,000 samples, and validated on an independent dataset of 2,000 samples. Each classifier contained features selected based on data analysis according to the weighted F1-score, and the best hyperparameters for the final model. The 3-classifier algorithm predicts TOO for 33 cancer types and subtypes belonging to solid neoplasms, independently of sample source, sequencing methods, and cohort. Validation of the Consensus classifier showed a higher accuracy (95% f1-score) compared to the DNA and RNA classifiers (79% and 93% f1-scores, respectively), along with a high sensitivity (95%), specificity (99%), and precision (96%), as it takes into account both genomic events and expression patterns. The TOO predictor was prospectively validated on approximately 298 clinical samples with a known diagnosis using all classifiers. The diagnosis was identified in 90% of clinical cases (295/297) by the Consensus classifier with 90% sensitivity and 94% precision. The call rate for the DNA and RNA classifiers was above 95%. Of note, sensitivity of the top 4 predicted diagnoses was > 90% for all 3 classifiers, and the calculated rule-out accuracy of the Consensus classifier was 97%. In conclusion, an ML-based algorithm was developed that utilizes genomic and transcriptomic data to accurately predict the TOO for CUP tumors. Utilizing the Consensus classifier after DNA and RNA classifiers helps to identify the TOO of the tumor with high specificity, which can guide precision oncology therapeutic options. Citation Format: Zoia Antysheva, Daria Kiriy, Anton Sivkov, Alexander Sarachackov, Alexandra Boyko, Naira Samarina, Nara Shin, Jessica H. Brown, Ivan Kozlov, Viktor Svekolkin, Alexander Bagaev, Nathan Fowler, Nikita Kotlov. An ML-based tool for predicting tissue of origin for cancer of unknown primary (CUP) based on genomic and transcriptomic data. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5405.
Human papillomavirus (HPV)-associated Head and Neck Squamous Cell Carcinoma (HPV+HNSCC) is now the most common HPV-associated malignancy in the United States. Current treatments can be associated with severe side-effects or lack of efficacy yet prognostic biomarkers are limited, slowing efforts to personalize treatment in HPV+HNSCC. Here, we describe the use of a transcriptomic-based analytical platform to analyze expression patterns of viral transcripts, the tumor microenvironment (TME), and viral genome integration, and associate these features with overall survival. Functional gene expression signatures were analyzed on publicly available HPV+HNSCC expression data (n=266). Unsupervised clustering analysis revealed 5 distinct and novel TME types across patients (immune-enriched non-fibrotic, immune-enriched fibrotic, fibrotic, immune-desert, immune-enriched luminal). These microenvironment subtypes were highly correlated with both overall survival and patient prognosis. Tumors with an immune-enriched microenvironment showed the highest survival rates, whereas fibrotic TME types were associated with poor survival (p < 0.05). Unsupervised clustering of a HPV+HNSCC cohort from The Cancer Genome Atlas (TCGA) (n=53), based on HPV transcript expression, revealed 4 HPV-related subtypes. Each subtype was enriched for distinct viral transcripts: E2/E5, E6/E7, E1/E4 and L1/L2. We then validated TME and HPV transcript-related classifications on an independent HPV+HNSCC cohort (n=132). Utilizing both viral transcript and TME subtypes, we found that the E2/E5 HPV subtype was associated with an immune-enriched TME and had a higher overall survival compared to the other subtypes. The E2/E5 subtype was also enriched for samples without HPV-genome integration, suggesting that HPV episomal DNA status and E2/E5 expression pattern may drive an inflamed microenvironment and improved prognosis. In contrast, E6/E7 subtype samples were associated with the fibrotic and depleted TME types, with lower values of T-cell and B-cell gene expression signatures and a lower survival rate. Both E1/E4 and L1/L2 subtypes were associated with the immune-enriched luminal TME types. These findings suggest that HPV-transcript expression patterns may drive modulation of the TME, and hence impact prognosis. Further validation of the relationships between viral gene expression, TME, and prognosis is warranted to understand if such subtypes could aid in the development of prognostic biomarkers for treatment selection. Citation Format: Daria Kiriy, Dmitry Tychinin, Nikita Kotlov, Olga Kudryashova, Anastasia Nikitina, Andrey Tyshevich, Naira Samarina, Ksenia Demina, Sandrine Degryse, Susan Raju Paul, Mark Poznansky, Krystle Lang Kuhs, James S. Lewis, Robert L. Ferris, Xiaowei Wang, Alexander Bagaev, Nathan Fowler, Lori Wirth, Daniel Faden. Viral transcript and tumor immune microenvironment-based transcriptomic profiling of HPV-associated head and neck cancer identifies subtypes associated with prognosis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 3823.
CUP is a relatively common diagnosis that accounts for 3-9% of all cancers. The prognosis is poor with median survival of approximately 9 months. The identification of the primary tumor and therapy targets could improve the survival of these patients. We employed the BostonGene Tumor PortraitTM platform to interrogate 19 CUP cases. Trained on >19,000 samples and validated on 28,000 samples from independent datasets, the machine-learning based algorithm integrates whole-exome and RNA sequencing (WES and RNAseq) analysis to characterize cancer drivers, the tumor microenvironment, potential targets, tumor composition, molecular signatures, and site of origin (94% sensitivity and 99% specificity). The predicted tissue of origin was considered acceptable when it was compatible with tumor histopathology and immunoprofile, and included in the differential diagnosis of radiologic studies. The discovered biomarkers and possible treatments were discussed during our multidisciplinary precision oncology meeting. 16 out of 19 CUP cases (84%) had an acceptable predicted tissue of origin. Two cases lacked clinical evidence to support the predicted primary tumor, and 1 case failed RNAseq. Lung accounted for most of the sites of origin (31%) followed by gastrointestinal (15%) and breast cancers (8%). Other diagnoses included melanoma, uterine, bladder, and renal carcinomas, among others. Except for one case, a clinically significant biomarker or target was found. Those included relevant mutational signatures (e.g., homologous recombination deficiency, DNA mismatch repair), genetic characteristics (high tumor mutational burden (TMB), microsatellite instability), activating alterations (FGFR1, MYC, ERBB2 amplifications; NCOA4::RET fusion), loss of function in tumor suppressor genes (TP53, FANCA, ATM), and gene overexpression (ER). Further, microenvironment analysis characterized the tumor immune infiltrate and the level of RNA expression of PD-L1, PD-L2, and CTLA4. These oncology biomarkers and potential targets are of significant value independent of tissue of origin. In addition, mutations in NF1, KRAS, TP53, MSH2, BRCA1, and PTEN were found and validated by commercially available targeted NGS panels. Based on CUP tumor profiling by this platform, positive treatment response has been observed in 3 out of 4 CUP patients thus far, e.g., one patient with metastatic disease that showed high TMB and immune infiltrated microenvironment treated with Ipilimumab and Nivolumab had a sustained response. Therapy in oncology is often determined by the tissue origin, making CUP a therapeutic challenge. In this study, we demonstrate the application of an integrative WES and RNAseq platform to not only predict the site of origin, but also to identify relevant biomarkers and therapeutic targets in CUP. Citation Format: Majd Al Assaad, Michael Sigouros, Jyothi Manohar, Daniela Guevara, Zoia Antysheva, Daria Kiriy, Alexandra Boyko, Naira Samarina, Nara Shin, Viktor Svekolkin, Svetlana Podsvirova, Noel English, Alaina Villarreal, Brianna McKenna, Cagdas Tazearslan, Diana Shamsutdinova, Vladimir Kushnarev, Troy Kane, David Wilkes, Manish Shah, Barbara Ma, Scott T. Tagawa, David Nanus, Jones Nauseef, Olivier Elemento, Juan Miguel Mosquera, Cora N. Sternberg. Cancer of unknown Primary (CUP): Beyond the identification of the site of origin by an integrative genomic approach [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2143.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.