Classification of different cancer types is an essential step in designing a decision support model for early cancer predictions. Using various machine learning (ML) techniques with ensemble learning is one such method used for classifications. In the present study, various ML algorithms were explored on twenty exome datasets, belonging to 5 cancer types. Initially, a data clean-up was carried out on 4181 variants of cancer with 88 features, and a derivative dataset was obtained using natural language processing and probabilistic distribution. An exploratory dataset analysis using principal component analysis was then performed in 1 and 2D axes to reduce the high-dimensionality of the data. To significantly reduce the imbalance in the derivative dataset, oversampling was carried out using SMOTE. Further, classification algorithms such as K-nearest neighbour and support vector machine were used initially on the oversampled dataset. A 4-layer artificial neural network model with 1D batch normalization was also designed to improve the model accuracy. Ensemble ML techniques such as bagging along with using KNN, SVM and MLPs as base classifiers to improve the weighted average performance metrics of the model. However, due to small sample size, model improvement was challenging. Therefore, a novel method to augment the sample size using generative adversarial network (GAN) and triplet based variational auto encoder (TVAE) was employed that reconstructed the features and labels generating the data. The results showed that from initial scrutiny, KNN showed a weighted average of 0.74 and SVM 0.76. Oversampling ensured that the accuracy of the derivative dataset improved significantly and the ensemble classifier augmented the accuracy to 82.91%, when the data was divided into 70:15:15 ratio (training, test and holdout datasets). The overall evaluation metric value when GAN and TVAE increased the sample size was found to be 0.92 with an overall comparison model of 0.66. Therefore, the present study designed an effective model for classifying cancers which when implemented to real world samples, will play a major role in early cancer diagnosis.
The unprecedented outbreak of the severe acute respiratory syndrome (SARS) Coronavirus-2, across the globe, triggered a worldwide uproar in the search for immediate treatment strategies. With no specific drug and not much data available, alternative approaches such as drug repurposing came to the limelight. To date, extensive research on the repositioning of drugs has led to the identification of numerous drugs against various important protein targets of the coronavirus strains, with hopes of the drugs working against the major variants of concerns (alpha, beta, gamma, delta, omicron) of the virus. Advancements in computational sciences have led to improved scope of repurposing via techniques such as structure-based approaches including molecular docking, molecular dynamic simulations and quantitative structure activity relationships, network-based approaches, and artificial intelligence-based approaches with other core machine and deep learning algorithms. This review highlights the various approaches to repurposing drugs from a computational biological perspective, with various mechanisms of action of the drugs against some of the major protein targets of SARS-CoV-2. Additionally, clinical trials data on potential COVID-19 repurposed drugs are also highlighted with stress on the major SARS-CoV-2 targets and the structural effect of variants on these targets. The interaction modelling of some important repurposed drugs has also been elucidated. Furthermore, the merits and demerits of drug repurposing are also discussed, with a focus on the scope and applications of the latest advancements in repurposing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.