Chuyu Wang scite author profile

Existing methods for computing the semantic similarity between Gene Ontology (GO) terms are often based on external datasets and, therefore are not intrinsic to GO. Furthermore, they not only fail to handle identical annotations but also show a strong bias toward well-annotated proteins when being used for measuring similarity of proteins. Inspired by the concept of cellular differentiation and dedifferentiation in developmental biology, we propose a shortest semantic differentiation distance (SSDD) based on the concept of semantic totipotency to measure the semantic similarity of GO terms and further compare the functional similarity of proteins. Using human ratings and a benchmark dataset, SSDD was found to improve upon existing methods for computing the semantic similarity of GO terms. An in-depth analysis shows that SSDD is able to distinguish identical annotations and does not depend on annotation richness, thus producing more unbiased and reliable results. Online services can be accessed at the Gene Functional Similarity Analysis Tools website (GFSAT: http://nclab.hit.edu.cn/GFSAT).

show abstract

RFAmyloid: A Web Server for Predicting Amyloid Proteins

Niu

Wang

et al. 2018

IJMS

View full text Add to dashboard Cite

Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy’s overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL .

show abstract

MiRTDL: A Deep Learning Approach for miRNA Target Prediction

Cheng

Guo

Wang

et al. 2016

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

MicroRNAs (miRNAs) regulate genes that are associated with various diseases. To better understand miRNAs, the miRNA regulatory mechanism needs to be investigated and the real targets identified. Here, we present miRTDL, a new miRNA target prediction algorithm based on convolutional neural network (CNN). The CNN automatically extracts essential information from the input data rather than completely relying on the input dataset generated artificially when the precise miRNA target mechanisms are poorly known. In this work, the constraint relaxing method is first used to construct a balanced training dataset to avoid inaccurate predictions caused by the existing unbalanced dataset. The miRTDL is then applied to 1,606 experimentally validated miRNA target pairs. Finally, the results show that our miRTDL outperforms the existing target prediction algorithms and achieves significantly higher sensitivity, specificity and accuracy of 88.43, 96.44, and 89.98 percent, respectively. We also investigate the miRNA target mechanism, and the results show that the complementation features are more important than the others.

show abstract

Machine learning and its applications in plant molecular studies

Sun

Wang

Ding

et al. 2019

View full text Add to dashboard Cite

The advent of high-throughput genomic technologies has resulted in the accumulation of massive amounts of genomic information. However, biologists are challenged with how to effectively analyze these data. Machine learning can provide tools for better and more efficient data analysis. Unfortunately, because many plant biologists are unfamiliar with machine learning, its application in plant molecular studies has been restricted to a few species and a limited set of algorithms. Thus, in this study, we provide the basic steps for developing machine learning frameworks and present a comprehensive overview of machine learning algorithms and various evaluation metrics. Furthermore, we introduce sources of important curated plant genomic data and R packages to enable plant biologists to easily and quickly apply appropriate machine learning algorithms in their research. Finally, we discuss current applications of machine learning algorithms for identifying various genes related to resistance to biotic and abiotic stress. Broad application of machine learning and the accumulation of plant sequencing data will advance plant molecular studies.

show abstract

imDC: an ensemble learning method for imbalanced classification with miRNA data

Wang

Guo

et al. 2015

Genet. Mol. Res.

View full text Add to dashboard Cite

ABSTRACT. Imbalances typically exist in bioinformatics and are also common in other areas. A drawback of traditional machine learning methods is the relatively little attention given to small sample classification. Thus, we developed imDC, which uses an ensemble learning concept in combination with weights and sample misclassification information to effectively classify imbalanced data. Our method showed better results when compared to other algorithms with UCI machine learning datasets and microRNA data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chuyu Wang

A novel insight into Gene Ontology semantic similarity

RFAmyloid: A Web Server for Predicting Amyloid Proteins

MiRTDL: A Deep Learning Approach for miRNA Target Prediction

Machine learning and its applications in plant molecular studies

imDC: an ensemble learning method for imbalanced classification with miRNA data

Contact Info

Product

Resources

About