Tu Kien T. Le scite author profile

BackgroundEpitope identification is an essential step toward synthetic vaccine development since epitopes play an important role in activating immune response. Classical experimental approaches are laborious and time-consuming, and therefore computational methods for generating epitope candidates have been actively studied. Most of these methods, however, are based on sophisticated nonlinear techniques for achieving higher predictive performance. The use of these techniques tend to diminish their interpretability with respect to binding potential: that is, they do not provide much insight into binding mechanisms.ResultsWe have developed a novel epitope prediction method named EpicCapo and its variants, EpicCapo+ and EpicCapo+REF. Nonapeptides were encoded numerically using a novel peptide-encoding scheme for machine learning algorithms by utilizing 40 amino acid pairwise contact potentials (referred to as AAPPs throughout this paper). The predictive performances of EpicCapo+ and EpicCapo+REF outperformed other state-of-the-art methods without losing interpretability. Interestingly, the most informative AAPPs estimated by our study were those developed by Micheletti and Simons while previous studies utilized two AAPPs developed by Miyazawa & Jernigan and Betancourt & Thirumalai. In addition, we found that all amino acid positions in nonapeptides could effect on performances of the predictive models including non-anchor positions. Finally, EpicCapo+REF was applied to identify candidates of promiscuous epitopes. As a result, 67.1% of the predicted nonapeptides epitopes were consistent with preceding studies based on immunological experiments.ConclusionsOur method achieved high performance in testing with benchmark datasets. In addition, our study identified a number of candidates of promiscuous CTL epitopes consistent with previously reported immunological experiments. We speculate that our techniques may be useful in the development of new vaccines. The R implementation of EpicCapo+REF is available at http://pirun.ku.ac.th/~fsciiok/EpicCapoREF.zip. Datasets are available at http://pirun.ku.ac.th/~fsciiok/Datasets.zip.

show abstract

Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach

Nguyen¹,

Dang

et al. 2014

JBiSE

View full text Add to dashboard Cite

Abstractβ-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset, the performance is still inadequate. In this study, we proposed a novel over-sampling technique FOST to deal with the class-imbalance problem. Experimental results on three standard benchmark datasets showed that our method is comparable with state-of-the-art methods. In addition, we applied our algorithm to five benchmark datasets from UCI Machine Learning Repository and achieved significant improvement in G-mean and Sensitivity. It means that our method is also effective for various imbalanced data other than β-turns and β-turn types.

show abstract

A novel over-sampling method and its application to miRNA prediction

Dang¹,

Hirose²,

Saethang³

et al. 2013

JBiSE

View full text Add to dashboard Cite

MicroRNAs (miRNAs) are short (~22 nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly classify human precursor micro-RNA (pre-miRNA) hairpins from both genome pseudo hairpins and other non-coding RNAs (ncRNAs). Although there were a few approaches achieving promising results in applying class imbalance learning methods, this issue has still not solved completely and successfully yet by the existing methods because of imbalanced class distribution in the datasets. For example, SMOTE is a famous and general over-sampling method addressing this problem, however in some cases it cannot improve or sometimes reduces classification performance. Therefore, we developed a novel over-sampling method named incre-mental-SMOTE to distinguish human pre-miRNA hairpins from both genome pseudo hairpins and other ncRNAs. Experimental results on pre-miRNA datasets from Batuwita et al. showed that our method achieved better Sensitivity and G-mean than the control (no oversampling), SMOTE, and several successsors of modified SMOTE including safe-level-SMOTE and border-line-SMOTE. In addition, we also applied the novel method to five imbalanced benchmark datasets from UCI Machine Learning Repository and achieved improvements in Sensitivity and G-mean. These results suggest that our method outperforms SMOTE and several successors of it in various biomedical classification problems including miRNA classification.

show abstract

D-IMPACT: A Data Preprocessing Algorithm to Improve the Performance of Clustering

Vu¹,

Hirose²,

Saethang³

et al. 2014

JSEA

View full text Add to dashboard Cite

In this study, we propose a data preprocessing algorithm called D-IMPACT inspired by the IMPACT clustering algorithm. D-IMPACT iteratively moves data points based on attraction and density to detect and remove noise and outliers, and separate clusters. Our experimental results on two-dimensional datasets and practical datasets show that this algorithm can produce new datasets such that the performance of the clustering algorithm is improved.

show abstract

A Novel Over-Sampling Method and its Application to Cancer Classification from Gene Expression Data

Dang

Hirose

Bui³

et al. 2013

CBIJ

View full text Add to dashboard Cite

One of the most critical and frequent problems in biomedical data classification is imbalanced class distribution, where samples from the majority class significantly outnumber the minority class. SMOTE is a well-known general over-sampling method used to address this problem; however, in some cases it cannot improve or even reduces classification performance. To address these issues, we have developed a novel minority over-sampling method named safe-SMOTE. Experimental results from two gene expression datasets for cancer classification (i.e., colon-cancer and leukemia) and six imbalanced benchmark datasets from the UCI Machine Learning Repository showed that our method achieved better sensitivity and G-mean values than both the control method (i.e., no over-sampling) and SMOTE. For example, in the colon-cancer dataset, although the sensitivity and specificity achieved by SMOTE (81.36% and 88.63%) were lower than for the control method (81.59% and 89.50%), safe-SMOTE in contrast had these values increase (81.82% and 90.50%). Similarly, the G-mean value of the control (85.45%) decreased to 84.91% when SMOTE was employed, but increased to 86.04% when using safe-SMOTE. In the leukemia dataset, SMOTE was able to improve the sensitivity and G-mean values with respect to the control; however, safe-SMOTE achieved noticeable, even greater improvements for both of these criteria.

show abstract

Inference of domain-domain interactions by matrix factorisation and domain-level features

Hirose

Nguyen

et al. 2014

IJFIPM

View full text Add to dashboard Cite

Predicting residue contacts for protein-protein interactions by integration of multiple information

Le¹,

Hirose²,

Vu³

et al. 2014

JBiSE

View full text Add to dashboard Cite

Detailed knowledge of interfacial region between interacting proteins is not only helpful in annotating function for proteins, but also very important for structure-based drug design and disease treatment. However, this is one of the most difficult tasks and current methods are constrained by some factors. In this study, we developed a new method to predict residue-residue contacts of two interacting protein domains by integrating information about evolutionary couplings andamino acid pairwise contact potentials, as well as domain-domain interaction interfaces. The experimental results showed that our proposed method outperformed the previous method with the same datasets. Moreover, the method promises an improvement in the source of template-based protein docking.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tu Kien T. Le

PAAQD: Predicting immunogenicity of MHC class I binding peptides using amino acid pairwise contact potentials and quantum topological molecular similarity descriptors

EpicCapo: epitope prediction using combined information of amino acid pairwise contact potentials and HLA-peptide contact site information

Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach

A novel over-sampling method and its application to miRNA prediction

D-IMPACT: A Data Preprocessing Algorithm to Improve the Performance of Clustering

A Novel Over-Sampling Method and its Application to Cancer Classification from Gene Expression Data

Inference of domain-domain interactions by matrix factorisation and domain-level features

Predicting residue contacts for protein-protein interactions by integration of multiple information

Contact Info

Product

Resources

About