The aggressiveness of a tumor depends on its genomic profile. Accordingly, it should be expected that the overall survival of cancer patients also depends on this, in particular, on the number and nature of mutations and the degree of gene activity. In this work, we try to predict overall survival by the genomic profile of the tumor, both by primary DNA and by RNA activity. One of the objectives of the study is to compare which of the presented baseline data better predict overall survival. The data were taken from the pan-cancer TCGA database (33 types of cancer) on DNA and gene expression. They were split into 2 datasets: DNA data only and expression only. In the DNA data, we select only pathogenic and likely pathogenic variants. The total number of genes containing these mutations was 1806, they are accepted as features. In the expression data, we selected only those genes that belong to the cancer-related pathways in the KEGG database (1821 genes). As a prediction effect for both datasets, a 3-year OS was chosen. Accordingly, if a patient crossed the three-year line of OS, he was considered a positive example, otherwise - a negative one. The DNA dataset contained 2159 positive examples and 1687 negative examples. The expression dataset contained 3363 positive and 2212 negative ones. Machine learning algorithms have been implemented using python 3. To determine the significance of the features, we used the Lasso linear regression algorithm with 5-fold cross validation. The result was obtained in the form of list of genes ordered by decreasing importance on the effect. In the DNA dataset, the algorithm selected 64 significant genes, including a sign (plus or minus) indicating an influence on a positive or negative effect, and a coefficient indicating the relative strength of an influence. For example, age 81-90 and EGFR mutations were at the negative end of the scale, while stage I and HRAS mutations were at the positive end. In the RNA dataset, the algorithm selected 75 of such important genes. At the negative end of the scale there were age 81-90 and changes in CDK6 expression, at the positive end - stage I and changes in RPS6 expression. Only 11 of significant features were shared across the two datasets. To predict the effect, we used a logistic regression algorithm with 5-fold cross-validation. Receiver characteristic curves (ROC), reflecting the sensitivity and specificity of the classification, were evaluated by the area under the curve (AUC). For the DNA dataset, the mean ROC-AUC for the 5 predictions was 0.72 (0.64-0.77), for the RNA dataset 0.74 (0.69-0.77). Predicting overall survival is essential for planning treatment strategies and selecting patients for clinical trials. Sufficiently high indicators of the classification quality show that this approach makes sense for further development. Further tuning of the algorithms will make it possible to predict the effect more accurately. Combinations of different input data must be tested. The list of important genes can be helpful in detecting molecular targets in drug discovery. Citation Format: Dmitrii K. Chebanov, Nadezhda S. Tatevosova, Irina N. Mikhaylova. Machine learning for predicting overall survival using whole exome DNA and gene expression data and analyzing the significance of features [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-045.
Patients data: We had 39 patients of 2 categories: with an objective response on dendritic cell vaccine therapy (20) and with disease progression without a response (19). All of them had 21 biomarkers (antigen concentration) as features. The positive effect means that the patient responded to therapy. The features data has quantitative (continuous) values, but we made it categorical by determining the 6 intervals, so each of the biomarkers was replaced with 6 encoding (‘dummy’) variables with possible values 1 or 0, depending on if the patient’s biomarker value belongs to this interval. Methods: The machine learning algorithm for response prediction is called JSM method for automatic support of scientific research (JSM method ASSR). It allows conducting a plausible reasoning that is realized in hypotheses generating and keeping only those that remain after each database enlargement. The reasoning is based on the similarity of the objects, that can be obtained with patients’ (objects’) features intersection using the statements from the set theory. According to it, the object is representing by a set of features, and hypotheses about its belongings to a class are also sets of features, that are specific for the current class. So, for each class there is a separate amount of hypotheses is generating. On the prediction stage each object given for the prediction is being checked for how many hypotheses are entering into it, or, in other words – is a subset of this object. Based on this information prediction is making: it depends on which hypotheses (of which class) are prevailing in entering in the object. This kind of machine learning approach also allows us to get the reasons why the particular object is classified into his class. So it can be used not only for the classification problem but also for the knowledge discovery about effects’ reasons. We divided the database into 2 batches: source base (18 objects) and first enlargement (17 objects) for the learning, and the rest 4 objects were left for testing. The source base and its enlargement are being permutated during the learning process for more reliability and robustness. We applied a cross-validation, according to which each object was at least 1 time in the test group. So it was 10 learning launches with predictions: 9 with 4 test examples and the rest 1 - with 3 test examples. Results: On all 10 cross-validation launches, there were 26 correct predictions. Also were 5 cases with a failure, 5 false-positive predictions, and 3 false-negative ones. Recall of the model was 85%, and precision is 77%, F1 score = 0.81. We also obtained reasons, which were common for all the database permutation. It meant that patients who will not respond to therapy should have CD8 value at interval 39.9-54.1 and IRI at interval 0.29-0.7. Discussion: Actually 39 samples are a small amount of data even for the JSM method ASSR, but we showed the suitability of described approach for the quantity data predicting and the reasons extracting. With the enlargement of the source database, it will be possible to get higher results. Citation Format: Dmitrii K. Chebanov, Irina N. Mikhaylova, Nadezhda S. Tatevosova. Method for predicting the effectiveness of the developed immune dendritic cell vaccine in melanoma patients based on cell surface antigens and machine learning with non-classical logic [abstract]. In: Abstracts: AACR Virtual Special Conference: Tumor Immunology and Immunotherapy; 2020 Oct 19-20. Philadelphia (PA): AACR; Cancer Immunol Res 2021;9(2 Suppl):Abstract nr PO086.
Background: Driver mutations are traditionally considered as actionable biomarkers for targeted drugs, but the resistance and relapse effects often occur even when these events are precisely discovered. At the same time, primary DNA mutations can be only the triggers for cell malignancy and further development of the tumor occurs due to following pathways imbalance, which may be reflected in gene expression. The goal is to detect preaffected pathways that are most close to the oncogenic affected state, so during the treatment strategy planning we could consider these pathways as the next potential targets after nonresponse or relapse. Methods: We took the data from TCGA Pan-Cancer Atlas on whole-exome sequencing and RNA expression for 33 cancers. Mutations were filtered based on their pathogenicity (1,2). The training set included data on mutations and corresponded RNA levels of 1821 cancer pathways-related genes (3). ML method-logistic regression, with 5-fold cross-validation with a test set, was realized on Python 3.7. Results: Using gene expression data, 9 most common actionable events were predicted: oncogenic mutations affecting Ras, Raf, Ras/Raf/MEK, PI3K, CDK protein families, amplifications of EGFR, ERBB2, CDK4 genes, with an accuracy of 80% - 93%. Results were the probabilities of events: range 10-30% occurrence is shown. Discussion: We considered the obtained molecular events probabilities as the scores of corresponding pathways’ malfunctions. For some molecular events, more than one-third of patients has >10% affected (unbalanced) pathway state. This approach after validation can be used in clinical research practice for patient cohorts risk stratification, or as additional reinforcement for drug companion tests. References: 1. COSMIC; 2. Chakravarty et al., 2017a; 3. KEGG. Citation Format: Dmitrii Chebanov, Nadezhda Tatevosova, Irina Mikhaylova. Identifying actionable pathway malfunction scores with ML algorithm for omics data [abstract]. In: Proceedings of the AACR Special Conference on the Microbiome, Viruses, and Cancer; 2020 Feb 21-24; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2020;80(8 Suppl):Abstract nr A32.
Introduction: Entra-nodal NK/T cell lymphoma (NKTCL) is an aggressive malignant lymphoma, with a prevalence of East Asia.Nowadays, despite of the promising improvement, there remains an urgent need of novel agent to improve the dismal prognosis of relapse and refractory patients.Bendamustine combines the properties of purine analogue and alkylating agent, which has been successfully used for B lymphoid neoplasms. However, its use in T cell lymphoma is limited, especially in NKTCL. Herein, we describe for the first time the potential efficacy of bendamustine in NKTCL.Methods and materials: NKTCL cell lines (NKYS, KHYG-1, NKL) were obtained from ExPASy. Bendamustine, chloroquine, concanamycin A and bafilomycin were purchased from Selleckchem. Cell proliferation
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.