Abstract:(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irr… Show more
“…Several methods were proposed for the assessment of drug efficiency based on gene/protein expression [ 16 , 17 , 18 , 19 ] or mutation patterns [ 20 , 21 , 22 ]. Unfortunately, most such methods are either proprietary or employ machine learning on preceding cases [ 23 , 24 , 25 , 26 ]. So, for evaluating a cannabis drug’s individual action, we have suggested a novel approach, the cannabis drug efficiency index (CDEI).…”
There are many varieties of Cannabis sativa that differ from each other by composition of cannabinoids, terpenes and other molecules. The medicinal properties of these cultivars are often very different, with some being more efficient than others. This report describes the development of a method and software for the analysis of the efficiency of various cannabis extracts to detect the anti-inflammatory properties of the various cannabis extracts. The method uses high-throughput gene expression profiling data but can potentially use other omics data as well. According to the signaling pathway topology, the gene expression profiles are convoluted into the signaling pathway activities using a signaling pathway impact analysis (SPIA) method. The method was tested by inducing inflammation in human 3D epithelial tissues, including intestine, oral and skin, and then exposing these tissues to various extracts and then performing transcriptome analysis. The analysis showed a different efficiency of the various extracts in restoring the transcriptome changes to the pre-inflammation state, thus allowing to calculate a different cannabis drug efficiency index (CDEI).
“…Several methods were proposed for the assessment of drug efficiency based on gene/protein expression [ 16 , 17 , 18 , 19 ] or mutation patterns [ 20 , 21 , 22 ]. Unfortunately, most such methods are either proprietary or employ machine learning on preceding cases [ 23 , 24 , 25 , 26 ]. So, for evaluating a cannabis drug’s individual action, we have suggested a novel approach, the cannabis drug efficiency index (CDEI).…”
There are many varieties of Cannabis sativa that differ from each other by composition of cannabinoids, terpenes and other molecules. The medicinal properties of these cultivars are often very different, with some being more efficient than others. This report describes the development of a method and software for the analysis of the efficiency of various cannabis extracts to detect the anti-inflammatory properties of the various cannabis extracts. The method uses high-throughput gene expression profiling data but can potentially use other omics data as well. According to the signaling pathway topology, the gene expression profiles are convoluted into the signaling pathway activities using a signaling pathway impact analysis (SPIA) method. The method was tested by inducing inflammation in human 3D epithelial tissues, including intestine, oral and skin, and then exposing these tissues to various extracts and then performing transcriptome analysis. The analysis showed a different efficiency of the various extracts in restoring the transcriptome changes to the pre-inflammation state, thus allowing to calculate a different cannabis drug efficiency index (CDEI).
“…Among these three categories, reinforcement learning is relatively less used for multi-omics data analysis. Developing the methodologies is an active area of research ( 21 – 25 ). Pan-cancer analysis is also being done.…”
Cancer is the manifestation of abnormalities of different physiological processes involving genes, DNAs, RNAs, proteins, and other biomolecules whose profiles are reflected in different omics data types. As these bio-entities are very much correlated, integrative analysis of different types of omics data, multi-omics data, is required to understanding the disease from the tumorigenesis to the disease progression. Artificial intelligence (AI), specifically machine learning algorithms, has the ability to make decisive interpretation of “big”-sized complex data and, hence, appears as the most effective tool for the analysis and understanding of multi-omics data for patient-specific observations. In this review, we have discussed about the recent outcomes of employing AI in multi-omics data analysis of different types of cancer. Based on the research trends and significance in patient treatment, we have primarily focused on the AI-based analysis for determining cancer subtypes, disease prognosis, and therapeutic targets. We have also discussed about AI analysis of some non-canonical types of omics data as they have the capability of playing the determiner role in cancer patient care. Additionally, we have briefly discussed about the data repositories because of their pivotal role in multi-omics data storing, processing, and analysis.
“…Many ML methods may be used for such applications, e.g. decision trees [12,13], random forests, RF [14,15], linear [16], logistic [17], lasso [18,19], ridge [15,20] regressions, multi-layer perceptron, MLP [12,15,21,22], support vectors machines [12,13,15,[23][24][25], adaptive boosting [26][27][28], as well as binomial naïve Bayesian [15] method.…”
Section: Introductionmentioning
confidence: 99%
“…Intelligent data filtering is, therefore, needed to reduce dimensionality of data [8]. However, a recent approach using dynamic feature extraction, or flexible data trimming, can significantly improve performances of ML-based methods for the real-world datasets [15,25].…”
Background
Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics.
Methods
We reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories.
Results
We identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases.
Conclusions
We collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others – microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.