Quang H. Trinh scite author profile

The human cytochrome P450 (CYP) superfamily holds responsibilities for the metabolism of both endogenous and exogenous compounds such as drugs, cellular metabolites, and toxins. The inhibition exerted on the CYP enzymes is closely associated with adverse drug reactions encompassing metabolic failures and induced side effects. In modern drug discovery, identification of potential CYP inhibitors is, therefore, highly essential. Alongside experimental approaches, numerous computational models have been proposed to address this biochemical issue. In this study, we introduce iCYP-MFE, a computational framework for virtual screening on CYP inhibitors toward 1A2, 2C9, 2C19, 2D6, and 3A4 isoforms. iCYP-MFE contains a set of five robust, stable, and effective prediction models developed using multitask learning incorporated with molecular fingerprint-embedded features. The results show that multitask learning can remarkably leverage useful information from related tasks to promote global performance. Comparative analysis indicates that iCYP-MFE achieves three predominant tasks, one equivalent task, and one less effective task compared to state-of-the-art methods. The area under the receiver operating characteristic curve (AUC-ROC) and the area under the precision-recall curve (AUC-PR) were two decisive metrics used for model evaluation. The prediction task for CYP2D6-inhibition achieves the highest AUC-ROC value of 0.93 while the prediction task for CYP1A2-inhibition obtains the highest AUC-PR value of 0.92. The substructural analysis preliminarily explains the nature of the CYP-inhibitory activity of compounds. An online web server for iCYP-MFE with a user-friendly interface was also deployed to support scientific communities in identifying CYP inhibitors.

show abstract

iANP-EC: Identifying Anticancer Natural Products Using Ensemble Learning Incorporated with Evolutionary Computation

Nguyen

Trinh

et al. 2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Cancer is one of the most deadly diseases that annually kills millions of people worldwide. The investigation on anticancer medicines has never ceased to seek better and more adaptive agents with fewer side effects. Besides chemically synthetic anticancer compounds, natural products are scientifically proved as a highly potential alternative source for anticancer drug discovery. Along with experimental approaches being used to find anticancer drug candidates, computational approaches have been developed to virtually screen for potential anticancer compounds. In this study, we construct an ensemble computational framework, called iANP-EC, using machine learning approaches incorporated with evolutionary computation. Four learning algorithms (k-NN, SVM, RF, and XGB) and four molecular representation schemes are used to build a set of classifiers, among which the top-four best-performing classifiers are selected to form an ensemble classifier. Particle swarm optimization (PSO) is used to optimise the weights used to combined the four top classifiers. The models are developed by a set of curated 997 compounds which are collected from the NPACT and CancerHSP databases. The results show that iANP-EC is a stable, robust, and effective framework that achieves an AUC-ROC value of 0.9193 and an AUC-PR value of 0.8366. The comparative analysis of molecular substructures between natural anticarcinogens and nonanticarcinogens partially unveils several key substructures that drive anticancerous activities. We also deploy the proposed ensemble model as an online web server with a user-friendly interface to support the research community in identifying natural products with anticancer activities.

show abstract

iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features

et al. 2022

View full text Add to dashboard Cite

Background Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology. Besides experimental techniques, computational methods have been developed to predict promoters. In this study, we propose iPromoter-Seqvec – an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences. The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets. Results The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance. Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.85 to 0.99 and AUCPR values ranging from 0.86 to 0.99. Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters. With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency. Conclusions iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes. Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities. Links to our source codes and web server are available at https://github.com/mldlproject/2022-iPromoter-Seqvec.

show abstract

Predicting Antimalarial Activity in Natural Products Using Pretrained Bidirectional Encoder Representations from Transformers

Nguyen-Vo

Trinh

Nguyen

et al. 2021

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Malaria is a threatening disease that has claimed many lives and has a high prevalence rate annually. Through the past decade, there have been many studies to uncover effective antimalarial compounds to combat this disease. Alongside chemically synthesized chemicals, a number of natural compounds have also been proven to be as effective in their antimalarial properties. Besides experimental approaches to investigate antimalarial activities in natural products, computational methods have been developed with satisfactory outcomes obtained. In this study, we propose a novel molecular encoding scheme based on Bidirectional Encoder Representations from Transformers and used our pretrained encoding model called NPBERT with four machine learning algorithms, including k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), eXtreme Gradient Boosting (XGB), and Random Forest (RF), to develop various prediction models to identify antimalarial natural products. The results show that SVM models are the best-performing classifiers, followed by the XGB, k-NN, and RF models. Additionally, comparative analysis between our proposed molecular encoding scheme and existing state-of-the-art methods indicates that NPBERT is more effective compared to the others. Moreover, the deployment of transformers in constructing molecular encoders is not limited to this study but can be utilized for other biomedical applications.

show abstract

i4mC-GRU: Identifying DNA N4-Methylcytosine sites in mouse genomes using bidirectional gated recurrent unit and sequence-embedded features

Nguyen-Vo

Trinh

Nguyen

et al. 2023

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

iR1mA-LSTM: Identifying N$$^{1}$$-Methyladenosine Sites in Human Transcriptomes Using Attention-Based Bidirectional Long Short-Term Memory

Trang

Nguyen-Vo

Trinh

et al. 2023

View full text Add to dashboard Cite

iNSP‐GCAAP: Identifying nonclassical secreted proteins using global composition of amino acid properties

Trang

Nguyen-Vo

Pham³

et al. 2022

Proteomics

View full text Add to dashboard Cite

Nonclassical secreted proteins (NSPs) refer to a group of proteins released into the extracellular environment under the facilitation of different biological transporting pathways apart from the Sec/Tat system. As experimental determination of NSPs is often costly and requires skilled handling techniques, computational approaches are necessary. In this study, we introduce iNSP‐GCAAP, a computational prediction framework, to identify NSPs. We propose using global composition of a customized set of amino acid properties to encode sequence data and use the random forest (RF) algorithm for classification. We used the training dataset introduced by Zhang et al. (Bioinformatics, 36(3), 704–712, 2020) to develop our model and test it with the independent test set in the same study. The area under the receiver operating characteristic curve on that test set was 0.9256, which outperformed other state‐of‐the‐art methods using the same datasets. Our framework is also deployed as a user‐friendly web‐based application to support the research community to predict NSPs.

show abstract

An Efficient Ensemble Deep Learning Framework for Blood-Brain Barrier Permeability Prediction

Vinh

Trinh

Nguyen

et al. 2023

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Quang H. Trinh

iCYP-MFE: Identifying Human Cytochrome P450 Inhibitors Using Multitask Learning and Molecular Fingerprint-Embedded Encoding

iANP-EC: Identifying Anticancer Natural Products Using Ensemble Learning Incorporated with Evolutionary Computation

iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features

Predicting Antimalarial Activity in Natural Products Using Pretrained Bidirectional Encoder Representations from Transformers

i4mC-GRU: Identifying DNA N4-Methylcytosine sites in mouse genomes using bidirectional gated recurrent unit and sequence-embedded features

iR1mA-LSTM: Identifying N$$^{1}$$-Methyladenosine Sites in Human Transcriptomes Using Attention-Based Bidirectional Long Short-Term Memory

iNSP‐GCAAP: Identifying nonclassical secreted proteins using global composition of amino acid properties

An Efficient Ensemble Deep Learning Framework for Blood-Brain Barrier Permeability Prediction

Contact Info

Product

Resources

About