Bioactivity descriptors for uncharacterized chemical compounds

Bertoni, Martino; Duran‐Frigola, Miquel; Badia-i-Mompel, Pau; Pauls, Eduardo; Orozco-Ruiz, Modesto; Guitart-Pla, Oriol; Alcalde, Víctor; Dı́az, Vı́ctor M.; Berenguer‐Llergo, Antoni; Brun-Heath, Isabelle; Villegas, Núria; Herreros, Antonio Garcı́a de; Aloy, Patrick

doi:10.1038/s41467-021-24150-4

Cited by 56 publications

(53 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To obtain the best performing models, we tried three different feature extraction methods i.e. bioactivity-based descriptors (Signaturizer library) 46 , chemistry-based molecular descriptors (Mordred software) 47 , and graph-based features (DeepChem library) 48 . In addition to these diversified features, we also tried multiple machine learning/deep learning-based classification algorithms for model building such as Random Forest (RF), Multilayer Perceptron (MLP), k-Nearest Neighbor (KNN), Support Vector Machine (SVM), Stochastic Gradient Descent (SGD), Logistic Regression (LR), GraphConvModel (GCM), Attentive FP (AFP), Graph Convolution Network (GCN), and Graph Attention Network (GAT) (Supplementary Figure 1b) .…”

Section: Resultsmentioning

confidence: 99%

Section: Carcinogenicity Predictionmentioning

confidence: 99%

See 1 more Smart Citation

Artificial Intelligence uncovers carcinogenic human metabolites

Mittal

Gautam

Roshan

et al. 2021

Preprint

View full text Add to dashboard Cite

The genome of a eukaryotic cell is often vulnerable to both intrinsic and extrinsic threats due to its constant exposure to a myriad of heterogeneous chemical compounds. Despite the availability of innate DNA damage repair pathways, some genomic lesions trigger cells for malignant transformation. Accurate prediction of carcinogens is an ever-challenging task due to the limited information about bonafide (non)carcinogens. This, in turn, constrains the generalisability of such models. We developed a novel ensemble classifier (Metabokiller) that accurately recognizes carcinogens by quantitatively assessing their chemical composition as well as potential to induce proliferation, oxidative stress, genotoxicity, alterations in epigenetic signatures, and activation of anti-apoptotic pathways, therefore obviates the need for bonafide (non)carcinogens for training model. Concomitant with the carcinogenicity prediction, it also reveals the contribution of the aforementioned biochemical processes in carcinogenicity, thereby making the proposed approach highly interpretable. Metabokiller outwits existing best practice methods for the carcinogenicity prediction task. We used Metabokiller to decode the cellular endogenous metabolic threats by screening a large pool of human metabolites and identified putative metabolites that could potentially trigger malignancy in normal cells. To cross-validate our predictions, we performed an array of functional assays and genome-wide transcriptome analysis on two Metabokiller-flagged, and previously uncharacterized human metabolites by using Saccharomyces cerevisiae as a model organism and observed larger synergy with the prediction probabilities. Finally, the carcinogenicity potential of these metabolites was evaluated using a malignancy transformation assay on human cells.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Carcinogenicity Predictionmentioning

confidence: 99%

Artificial Intelligence uncovers carcinogenic human metabolites

Mittal

Gautam

Roshan

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The RDMD contained 200 molecular descriptors, including physicochemical properties and structure characteristics, which have been used in many studies and have achieved satisfactory results [ 23 , 24 , 25 , 26 , 27 ]. The CCMD is a novel type of biological descriptor containing 25 various bioactive spaces [ 28 ]. Moreover, the simple representation of CCMD is compatible with different types of computational tools in a multi-dimensional form.…”

Section: Methodsmentioning

confidence: 99%

Drug-Induced Immune Thrombocytopenia Toxicity Prediction Based on Machine Learning

et al. 2022

View full text Add to dashboard Cite

Drug-induced immune thrombocytopenia (DITP) often occurs in patients receiving many drug treatments simultaneously. However, clinicians usually fail to accurately distinguish which drugs can be plausible culprits. Despite significant advances in laboratory-based DITP testing, in vitro experimental assays have been expensive and, in certain cases, cannot provide a timely diagnosis to patients. To address these shortcomings, this paper proposes an efficient machine learning-based method for DITP toxicity prediction. A small dataset consisting of 225 molecules was constructed. The molecules were represented by six fingerprints, three descriptors, and their combinations. Seven classical machine learning-based models were examined to determine an optimal model. The results show that the RDMD + PubChem-k-NN model provides the best prediction performance among all the models, achieving an area under the curve of 76.9% and overall accuracy of 75.6% on the external validation set. The application domain (AD) analysis demonstrates the prediction reliability of the RDMD + PubChem-k-NN model. Five structural fragments related to the DITP toxicity are identified through information gain (IG) method along with fragment frequency analysis. Overall, as far as known, it is the first machine learning-based classification model for recognizing chemicals with DITP toxicity and can be used as an efficient tool in drug design and clinical therapy.

show abstract

“…Accordingly, graph-based DNNs including message passing networks have increasingly been investigated for learning model-internal representations from molecular structure (Chuang et al, 2020). In addition to graph-based representation learning, DL has recently also been applied to predict biological signatures of test compounds (Bertoni et al, 2021), which might be combined with standard structural descriptors in virtual screening (vide supra). However, on the basis of currently available data, it remains to be determined whether alternative molecular representations-be they learned from graphs or predicted-might yield higher performance in ML and other applications than long-used standards such as molecular fingerprints or numerical descriptors.…”

Section: Deep Neural Networkmentioning

confidence: 99%

Deep Machine Learning for Computer-Aided Drug Design

Bajorath

2022

Front. Drug. Discov.

View full text Add to dashboard Cite

In recent years, deep learning (DL) has led to new scientific developments with immediate implications for computer-aided drug design (CADD). These include advances in both small molecular and macromolecular modeling, as highlighted herein. Going forward, these developments also challenge CADD in different ways and require further progress to fully realize their potential for drug discovery. For CADD, these are exciting times and at the very least, the dynamics of the discipline will further increase.

show abstract

Bioactivity descriptors for uncharacterized chemical compounds

Cited by 56 publications

References 44 publications

Artificial Intelligence uncovers carcinogenic human metabolites

Artificial Intelligence uncovers carcinogenic human metabolites

Drug-Induced Immune Thrombocytopenia Toxicity Prediction Based on Machine Learning

Deep Machine Learning for Computer-Aided Drug Design

Contact Info

Product

Resources

About