Protein secondary structure prediction (PSSP) is an important research field in bioinformatics. The representation of protein sequence features could be treated as a matrix, which includes the amino-acid residue (time-step) dimension and the feature vector dimension. Common approaches to predict secondary structures only focus on the amino-acid residue dimension. However, the feature vector dimension may also contain useful information for PSSP. To integrate the information on both dimensions of the matrix, we propose a hybrid deep learning framework, two-dimensional convolutional bidirectional recurrent neural network (2C-BRNN), for improving the accuracy of 8-class secondary structure prediction. The proposed hybrid framework is to extract the discriminative local interactions between amino-acid residues by two-dimensional convolutional neural networks (2DCNNs), and then further capture long-range interactions between amino-acid residues by bidirectional gated recurrent units (BGRUs) or bidirectional long short-term memory (BLSTM). Specifically, our proposed 2C-BRNNs framework consists of four models: 2DConv-BGRUs, 2DCNN-BGRUs, 2DConv-BLSTM and 2DCNN-BLSTM. Among these four models, the 2DConv- models only contain two-dimensional (2D) convolution operations. Moreover, the 2DCNN- models contain 2D convolutional and pooling operations. Experiments are conducted on four public datasets. The experimental results show that our proposed 2DConv-BLSTM model performs significantly better than the benchmark models. Furthermore, the experiments also demonstrate that the proposed models can extract more meaningful features from the matrix of proteins, and the feature vector dimension is also useful for PSSP. The codes and datasets of our proposed methods are available at https://github.com/guoyanb/JBCB2018/ .
Diagnosis prediction exploits electronic health records (EHRs) to predict the future diagnoses of patients, further supporting clinical decision making and personalized treatments. However, a patient's EHR is an irregular sequence of visits that contains a large number of medical concepts. The disease progression patterns are closely related to the visits, as well as the contextual knowledge of each visit. The existing diagnosis prediction methods ignore the complex relationships between the visits and the contextual knowledge, and thus cannot achieve satisfactory performance.Therefore, we develop a knowledge-aware representation learning method to comprehensively model these complex relationships. Specifically, we first construct a medical knowledge graph to model the correlations between medical concepts in EHRs, and project the contextual knowledge into the pre-learned vectors. We then devise an enhanced gated recurrent unit (GRU) neural network to extract the longterm intra-relationships between visits, and design a novel knowledge attention module to capture the complex inter-relationships between the visits and the contextual knowledge. Armed with these, we provide a powerful and flexible framework to capture the long-term discriminative disease progression patterns for diagnosis prediction. Intensive experiments are conducted on two real-world EHR datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.