Wenzheng Bao scite author profile

It is well known that DNA sequence contains a certain amount of transcription factors (TF) binding sites, and only part of them are identified through biological experiments. However, these experiments are expensive and time-consuming. To overcome these problems, some computational methods, based on k-mer features or convolutional neural networks, have been proposed to identify TF binding sites from DNA sequences. Although these methods have good performance, the context information that relates to TF binding sites is still lacking. Research indicates that standard recurrent neural networks (RNN) and its variants have better performance in time-series data compared with other models. In this study, we propose a model, named KEGRU, to identify TF binding sites by combining Bidirectional Gated Recurrent Unit (GRU) network with k-mer embedding. Firstly, DNA sequences are divided into k-mer sequences with a specified length and stride window. And then, we treat each k-mer as a word and pre-trained word representation model though word2vec algorithm. Thirdly, we construct a deep bidirectional GRU model for feature learning and classification. Experimental results have shown that our method has better performance compared with some state-of-the-art methods. Additional experiments about embedding strategy show that k-mer embedding will be helpful to enhance model performance. The robustness of KEGRU is proved by experiments with different k-mer length, stride window and embedding vector dimension.

show abstract

A simplified procedure for gram-scale production of sialylglycopeptide (SGP) from egg yolks and subsequent semi-synthesis of Man3GlcNAc oxazoline

Sun

Bao

Tian

et al. 2014

Carbohydrate Research

View full text Add to dashboard Cite

Novel human microbe-disease association prediction using network consistency projection

2017

View full text Add to dashboard Cite

BackgroundAccumulating biological and clinical reports have indicated that imbalance of microbial community is closely associated with occurrence and development of various complex human diseases. Identifying potential microbe-disease associations, which could provide better understanding of disease pathology and further boost disease diagnostic and prognostic, has attracted more and more attention. However, hardly any computational models have been developed for large scale microbe-disease association prediction.ResultsIn this article, based on the assumption that microbes with similar functions tend to share similar association or non-association patterns with similar diseases and vice versa, we proposed the model of Network Consistency Projection for Human Microbe-Disease Association prediction (NCPHMDA) by integrating known microbe-disease associations and Gaussian interaction profile kernel similarity for microbes and diseases. NCPHMDA yielded outstanding AUCs of 0.9039, 0.7953 and average AUC of 0.8918 in global leave-one-out cross validation, local leave-one-out cross validation and 5-fold cross validation, respectively. Furthermore, colon cancer, asthma and type 2 diabetes were taken as independent case studies, where 9, 9 and 8 out of the top 10 predicted microbes were successfully confirmed by recent published clinical literature.ConclusionNCPHMDA is a non-parametric universal network-based method which can simultaneously predict associated microbes for investigated diseases but does not require negative samples. It is anticipated that NCPHMDA would become an effective biological resource for clinical experimental guidance.Electronic supplementary materialThe online version of this article (10.1186/s12859-017-1968-2) contains supplementary material, which is available to authorized users.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wenzheng Bao

Recurrent Neural Network for Predicting Transcription Factor Binding Sites

A simplified procedure for gram-scale production of sialylglycopeptide (SGP) from egg yolks and subsequent semi-synthesis of Man3GlcNAc oxazoline

Novel human microbe-disease association prediction using network consistency projection

Contact Info

Product

Resources

About