StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors

Malik, Aijaz Ahmad; Chotpatiwetchkul, Warot; Phanus‐umporn, Chuleeporn; Nantasenamat, Chanin; Charoenkwan, Phasit; Shoombuatong, Watshara

doi:10.1007/s10822-021-00418-1

Cited by 16 publications

(13 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Unlike other ensemble learning strategies, this strategy enables an automatic integration of different ML classifiers in order to construct a single robust prediction model 23 . The stacked strategy has successfully achieve better performance as compared with its constituent baseline models 23 , 24 , 27 , 30 , 31 . The stacking strategy consists of two main steps, while the corresponding models at each step are referred to as baseline and meta models, respectively.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

Ahmad

Charoenkwan

Quinn

et al. 2022

Sci Rep

Self Cite

View full text Add to dashboard Cite

Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).

show abstract

Section: Methodsmentioning

confidence: 99%

“…The feature subset achieving the highest Matthews correlation coefficient (MCC) was considered as the optimal feature subset. The implementation of these classifiers in the two-step feature selection strategy is the same as used in our previous studies 18 , 31 , 38 – 41 …”

Section: Methodsmentioning

confidence: 99%

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

Ahmad

Charoenkwan

Quinn

et al. 2022

Sci Rep

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this phase, we applied 12 well-known feature encodings to extract samples in the AR-TRN dataset, including CKD, CKDExt, CKDGraph, AP2D, KR, MACCS, Circle, Estate, Hybrid, PubChem, FP4C, and FP4. These molecular descriptors are widely used to represent several types of inhibitors [ 41 , 45 – 48 ]. In the meanwhile, 13 popular ML algorithms were selected for the construction of baseline models, including RF, AdaBoost (ADA), light gradient boosting machine (LGBM), partial least squares (PLS), multilayer perceptron (MLP), naive Bayes (NB), decision tree (DT), extremely randomized trees (ET), extreme gradient boosting (XGB), k-nearest neighbor (KNN), logistic regression (LR), support vector machine (SVM) combined with linear (SVMLN) and radial basis function (SVMRBF) kernels.…”

Section: Methodsmentioning

confidence: 99%

DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists

et al. 2023

Self Cite

View full text Add to dashboard Cite

Drug resistance represents a major obstacle to therapeutic innovations and is a prevalent feature in prostate cancer (PCa). Androgen receptors (ARs) are the hallmark therapeutic target for prostate cancer modulation and AR antagonists have achieved great success. However, rapid emergence of resistance contributing to PCa progression is the ultimate burden of their long-term usage. Hence, the discovery and development of AR antagonists with capability to combat the resistance, remains an avenue for further exploration. Therefore, this study proposes a novel deep learning (DL)-based hybrid framework, named DeepAR, to accurately and rapidly identify AR antagonists by using only the SMILES notation. Specifically, DeepAR is capable of extracting and learning the key information embedded in AR antagonists. Firstly, we established a benchmark dataset by collecting active and inactive compounds against AR from the ChEMBL database. Based on this dataset, we developed and optimized a collection of baseline models by using a comprehensive set of well-known molecular descriptors and machine learning algorithms. Then, these baseline models were utilized for creating probabilistic features. Finally, these probabilistic features were combined and used for the construction of a meta-model based on a one-dimensional convolutional neural network. Experimental results indicated that DeepAR is a more accurate and stable approach for identifying AR antagonists in terms of the independent test dataset, by achieving an accuracy of 0.911 and MCC of 0.823. In addition, our proposed framework is able to provide feature importance information by leveraging a popular computational approach, named SHapley Additive exPlanations (SHAP). In the meanwhile, the characterization and analysis of potential AR antagonist candidates were achieved through the SHAP waterfall plot and molecular docking. The analysis inferred that N-heterocyclic moieties, halogenated substituents, and a cyano functional group were significant determinants of potential AR antagonists. Lastly, we implemented an online web server by using DeepAR (at http://pmlabstack.pythonanywhere.com/DeepAR). We anticipate that DeepAR could be a useful computational tool for community-wide facilitation of AR candidates from a large number of uncharacterized compounds.

show abstract

“…Unlike other conventional ensemble strategies, the stacking strategy integrates the strengths of different predictive models without human intervention to generate the final meta-predictor 44 – 47 . To date, numerous previous studies have indicated that the final meta-predictor can potentially attain a more stable predictive performance 48 – 50 . The overall workflow for the development of StackPR contains three major steps (i.e., baseline model construction, new feature vector generation, and meta-predictor development) as provided in the paragraphs hereafter (Fig.…”

Section: Methodsmentioning

confidence: 99%

StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy

Schaduangrat

Anuwongcharoen

Moni

et al. 2022

Sci Rep

Self Cite

View full text Add to dashboard Cite

Progesterone receptors (PRs) are implicated in various cancers since their presence/absence can determine clinical outcomes. The overstimulation of progesterone can facilitate oncogenesis and thus, its modulation through PR inhibition is urgently needed. To address this issue, a novel stacked ensemble learning approach (termed StackPR) is presented for fast, accurate, and large-scale identification of PR antagonists using only SMILES notation without the need for 3D structural information. We employed six popular machine learning (ML) algorithms (i.e., logistic regression, partial least squares, k-nearest neighbor, support vector machine, extremely randomized trees, and random forest) coupled with twelve conventional molecular descriptors to create 72 baseline models. Then, a genetic algorithm in conjunction with the self-assessment-report approach was utilized to determine m out of the 72 baseline models as means of developing the final meta-predictor using the stacking strategy and tenfold cross-validation test. Experimental results on the independent test dataset show that StackPR achieved impressive predictive performance with an accuracy of 0.966 and Matthew’s coefficient correlation of 0.925. In addition, analysis based on the SHapley Additive exPlanation algorithm and molecular docking indicates that aliphatic hydrocarbons and nitrogen-containing substructures were the most important features for having PR antagonist activity. Finally, we implemented an online webserver using StackPR, which is freely accessible at http://pmlabstack.pythonanywhere.com/StackPR. StackPR is anticipated to be a powerful computational tool for the large-scale identification of unknown PR antagonist candidates for follow-up experimental validation.

show abstract

StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors

Cited by 16 publications

References 62 publications

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins

DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists

StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy

Contact Info

Product

Resources

About