Extraction of valuable data from extensive datasets is a standout amongst the most vital exploration issues. Association rule mining is one of the highly used methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent itemsets) is most crucial part of the association rule mining task. Many single-machine based association rule mining algorithms exist but the massive amount of data available these days is above the capacity of a single machine based algorithm. Therefore, to meet the demands of this ever-growing enormous data, there is a need for distributed association rule mining algorithm which can run on multiple machines. For these types of parallel/distributed applications, MapReduce is one of the best fault-tolerant frameworks. Hadoop is one of the most popular open-source software frameworks with MapReduce based approach for distributed storage and processing of large datasets using standalone clusters built from commodity hardware. But heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce based platforms are being developed for parallel computing in recent years. Among them, a platform, namely, Spark have attracted a lot of attention because of its inbuilt support to distributed computations. Therefore, we implemented a distributed association rule mining algorithm on Spark named as Adaptive-Miner which uses adaptive approach for finding frequent patterns with higher accuracy and efficiency. Adaptive-Miner uses an adaptive strategy based on the partial processing of datasets. Adaptive-Miner makes execution plans before every iteration and goes with the best suitable plan to minimize time and space complexity. Adpative-Miner is a dynamic association rule mining algorithm which change its approach based on the nature of dataset. Therefore, it is different and better than state-of-the-art static association rule mining algorithms. We conduct in-depth experiments to gain insight into the effectiveness, efficiency, and scalability of the Adaptive-Miner algorithm on Spark.
Neoadjuvant chemoradiotherapy is commonly used to treat rectal cancer but patients have different levels of response and/or toxic effects. As part of the Stratification in COloRecTal cancer (S:CORT) programme, we collected 257 rectal biopsies from two cohorts: Grampian (single hospital) and Aristotle (clinical trial). All patients had been subsequently treated with identical regimen of neoadjuvant radiotherapy and capecitabine. We performed trancriptomic, mutation and copy number profiling and aimed to identify biomarkers associated with the robust pathological endpoint of complete response (CR). Key biological determinants were identified by linear regression of different pre-defined, hypothesis-driven biomarkers for radiotherapy response, adjusted by the known confounders T and N stage. A novel RNA signature was derived using a personalised bioinformatical pipeline using a wide range of machine learning approaches. Results were validated in a publicly available transcriptomic cohort of 107 patients treated with similar dose of radiotherapy and 5-fluorouracil infusion. Further comparision of the biological determinants and the novel RNA signature were performed in the same cohorts and also TCGA by linear regression. Previously published transcriptomic signatures were retrieved and assessed in the validation, unseen cohort. Grampian and Aristotle cohorts had similar statistical power and showed similar associations of CR with biological candidates, 10 of them being significant or borderline (p<0.1). Accordingly, both cohorts were merged into a single discovery set to better assess which ones would show additive, independent association. Following multivariable stepwise regression the final model was composed of the immune biomarkers cytotoxic lymphocytes and CMS1 for radiosensitivity while the stromal TGFb Fibroblasts and epithelial APC mutations were for radioresistance. The first three variables were validated in the transcriptomic validation set (Cyt lymph OR 7.09, p=0.01; CMS1 OR 5.39, p=0.02; TGFb Fib OR 0.27, p=0.04). In parallel, a 33-gene signature, trained in the discovery cohort by a comprehensive machine learning pipeline, showed excellent predictive ability in the validation cohort (0.9 AUC; 88% accuracy, 90% sensitivity, 86% specificity). Most genes were associated with at least one of the four biological features identified in the discovery set, validation set and a third cohort of colorectal cancer resections. Our novel signature showed much better predictive ability than other previously published transcriptomic signatures in the validation, unseen cohort. The immune, stromal and epithelial components of rectal tumours are important players for prediction of CR to radiotherapy in rectal cancer. A 33-gene transcriptomic biomarker can be used to effectively select patients that are highly likely to achieve CR allowing organ preservation while modulation of the relevant biological features in the other patients may be tested to improve their poor outcome with current treatment strategies. Citation Format: Enric Domingo, Sanjay Rathee, Andrew Blake, Leslie M. Samuel, Graeme I. Murray, David Sebag-Montefiore, Simon Gollins, Nicholas West, Rubina Begum, Marian Duggan, Laura White, Susan Richman, Philip Quirke, James Robineau, Keara Redmond, Aikaterini Chatzipli, Ultan McDermott, Ian Tomlinson, Philip Dunne, Francesca Buffa, Tim Maughan. Stratification of radiotherapy and fluoropyrimidine-based chemotherapy from multi-omic profiling in rectal cancer biopsies [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr LB129.
Drug-induced liver injury (DILI) is a class of adverse drug reactions (ADR) that causes problems in both clinical and research settings. It is the most frequent cause of acute liver failure in the majority of Western countries and is a major cause of attrition of novel drug candidates. Manual trawling of the literature is the main route of deriving information on DILI from research studies. This makes it an inefficient process prone to human error. Therefore, an automatized AI model capable of retrieving DILI-related articles from the huge ocean of literature could be invaluable for the drug discovery community. In this study, we built an artificial intelligence (AI) model combining the power of natural language processing (NLP) and machine learning (ML) to address this problem. This model uses NLP to filter out meaningless text (e.g., stop words) and uses customized functions to extract relevant keywords such as singleton, pair, and triplet. These keywords are processed by an apriori pattern mining algorithm to extract relevant patterns which are used to estimate initial weightings for a ML classifier. Along with pattern importance and frequency, an FDA-approved drug list mentioning DILI adds extra confidence in classification. The combined power of these methods builds a DILI classifier (DILIC), with 94.91% cross-validation and 94.14% external validation accuracy. To make DILIC as accessible as possible, including to researchers without coding experience, an R Shiny app capable of classifying single or multiple entries for DILI is developed to enhance ease of user experience and made available at https://researchmind.co.uk/diliclassifier/. Additionally, a GitHub link (https://github.com/sanjaysinghrathi/DILI-Classifier) for app source code and ISMB extended video talk (https://www.youtube.com/watch?v=j305yIVi_f8) are available as supplementary materials.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.