2021
DOI: 10.1021/acs.jcim.0c01439
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Automated Framework for QSAR Modeling of Highly Imbalanced Leishmania High-Throughput Screening Data

Abstract: In silico prediction of antileishmanial activity using quantitative structure−activity relationship (QSAR) models has been developed on limited and small datasets. Nowadays, the availability of large and diverse high-throughput screening data provides an opportunity to the scientific community to model this activity from the chemical structure. In this study, we present the first KNIME automated workflow to modeling a large, diverse, and highly imbalanced dataset of compounds with antileishmanial activity. Bec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 52 publications
(90 reference statements)
0
9
0
Order By: Relevance
“…To determine the most relevant variables for the classification of SCAMs based on a DT strategy, the same variable selection algorithm as the one published in the paper by Casanova-Alvarez et al . was implemented: a selection of variables by permutation using a decision tree algorithm combined with a recursive selection of least correlated variables. This algorithm is encapsulated in a component named “variable selection by decision tree”.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To determine the most relevant variables for the classification of SCAMs based on a DT strategy, the same variable selection algorithm as the one published in the paper by Casanova-Alvarez et al . was implemented: a selection of variables by permutation using a decision tree algorithm combined with a recursive selection of least correlated variables. This algorithm is encapsulated in a component named “variable selection by decision tree”.…”
Section: Methodsmentioning
confidence: 99%
“…Strata can be gathered based on both the partial level and incremental level of statistic performance. Our strategy is an extension from 1D to 2D stratification of work recently published by Casanova-Alvarez and coauthors and an evolution in terms of classification and recursive variable selection of work published by Sheridan. , ISE is a machine learning model agnostic and thus can be used in combination to any classification base model as illustrated in this study. ISE predictive models are implemented using the KNIME open-source software, and their workflows for prediction with full data are available in open-access…”
Section: Introductionmentioning
confidence: 99%
“…While there are several studies investigating resampling in the context of bioassay modelling [ 5 , 28 – 30 ], changing the training objective has not been thoroughly investigated thus far. This study directly addresses this gap by investigating the effectiveness of a variety of recently published imbalance-insensitive loss functions for training Gradient Boosting classifiers.…”
Section: Introductionmentioning
confidence: 99%
“…In another contribution, an automated workflow was created to build a classification-based model for diverse and imbalanced data sets. 38 This workflow was tested using a data set composed of 196 173 compounds, with 1063 compounds displaying antileishmanial activity. Six different methods were tested to build a consensus model, and the model using decision trees had the best performance.…”
mentioning
confidence: 99%
“…DLCA showed a better performance for the two data sets, compared to other consensus approaches. In another contribution, an automated workflow was created to build a classification-based model for diverse and imbalanced data sets . This workflow was tested using a data set composed of 196 173 compounds, with 1063 compounds displaying antileishmanial activity.…”
mentioning
confidence: 99%