MotivationHigh-resolution Hi-C data are indispensable for the studies of three-dimensional (3D) genome organization at kilobase level. However, generating high-resolution Hi-C data (e.g. 5 kb) by conducting Hi-C experiments needs millions of mammalian cells, which may eventually generate billions of paired-end reads with a high sequencing cost. Therefore, it will be important and helpful if we can enhance the resolutions of Hi-C data by computational methods.ResultsWe developed a new computational method named HiCNN that used a 54-layer very deep convolutional neural network to enhance the resolutions of Hi-C data. The network contains both global and local residual learning with multiple speedup techniques included resulting in fast convergence. We used mean squared errors and Pearson’s correlation coefficients between real high-resolution and computationally predicted high-resolution Hi-C data to evaluate the method. The evaluation results show that HiCNN consistently outperforms HiCPlus, the only existing tool in the literature, when training and testing data are extracted from the same cell type (i.e. GM12878) and from two different cell types in the same or different species (i.e. GM12878 as training with K562 as testing, and GM12878 as training with CH12-LX as testing). We further found that the HiCNN-enhanced high-resolution Hi-C data are more consistent with real experimental high-resolution Hi-C data than HiCPlus-enhanced data in terms of indicating statistically significant interactions. Moreover, HiCNN can efficiently enhance low-resolution Hi-C data, which eventually helps recover two chromatin loops that were confirmed by 3D-FISH.Availability and implementationHiCNN is freely available at http://dna.cs.miami.edu/HiCNN/.Supplementary information
Supplementary data are available at Bioinformatics online.
For a learning model to be effective in online modeling of nonstationary data, it must not only be equipped with high adaptability to track the changing data dynamics but also maintain low complexity to meet online computational restrictions. Based on these two important principles, in this paper, we propose a fast adaptive gradient radial basis function (GRBF) network for nonlinear and nonstationary time series prediction. Specifically, an initial compact GRBF model is constructed on the training data using the orthogonal least squares algorithm, which is capable of modeling variations of local mean and trend in the signal well. During the online operation, when the current model does not perform well, the worst performing GRBF node is replaced by a new node, whose structure is optimized to fit the current data. Owing to the local one-step predictor property of GRBF node, this adaptive node replacement can be done very efficiently. Experiments involving two chaotic time series and two real-world signals are used to demonstrate the superior online prediction performance of the proposed fast adaptive GRBF algorithm over a range of benchmark schemes, in terms of prediction accuracy and real-time computational complexity.
Health is vital to every human being. To further improve its already respectable medical technology, the medical community is transitioning towards a proactive approach which anticipates and mitigates risks before getting ill. This approach requires measuring the physiological signals of human and analyzes these data at regular intervals. In this paper, we present a novel approach to apply deep learning in physiological signals analysis that allows doctor to identify latent risks. However, extracting high level information from physiological time-series data is a hard problem faced by the machine learning communities. Therefore, in this approach, we apply model based on convolutional neural network that can automatically learn features from raw physiological signals in an unsupervised manner and then based on the learned features use multivariate Gauss distribution anomaly detection method to detect anomaly data. Our experiment is shown to have a significant performance in physiological signals anomaly detection. So it is a promising tool for doctor to identify early signs of illness even if the criteria are unknown a priori.
This paper proposes a selective ensemble of multiple local model learning for modeling and identification of nonlinear and nonstationary systems, in which the set of local linear models are self adapted to capture the newly emerging process characteristics and the prediction of the process output is also self adapted based on an optimally selected ensemble of subset linear local models. Specifically, our selective ensemble of multiple local model learning approach performs the model adaptation at two levels. At the level of local model adaptation, a newly emerging process state in the incoming data is automatically identified and a new local linear model is fitted to this newly emerged process state. At the level of online prediction, a subset of candidate local linear models are optimally selected and the prediction of the process output is computed as an optimal linear combiner of the selected subset local linear models. Two case studies involving chaotic time series prediction and modeling of a real-world industrial microwave heating process are used to demonstrate the effectiveness of our proposed approach, in comparison with other existing methods for modeling and identification of nonlinear and time-varying systems.
We present a deep-learning package named HiCNN2 to learn the mapping between low-resolution and high-resolution Hi-C (a technique for capturing genome-wide chromatin interactions) data, which can enhance the resolution of Hi-C interaction matrices. The HiCNN2 package includes three methods each with a different deep learning architecture: HiCNN2-1 is based on one single convolutional neural network (ConvNet); HiCNN2-2 consists of an ensemble of two different ConvNets; and HiCNN2-3 is an ensemble of three different ConvNets. Our evaluation results indicate that HiCNN2-enhanced high-resolution Hi-C data achieve smaller mean squared error and higher Pearson’s correlation coefficients with experimental high-resolution Hi-C data compared with existing methods HiCPlus and HiCNN. Moreover, all of the three HiCNN2 methods can recover more significant interactions detected by Fit-Hi-C compared to HiCPlus and HiCNN. Based on our evaluation results, we would recommend using HiCNN2-1 and HiCNN2-3 if recovering more significant interactions from Hi-C data is of interest, and HiCNN2-2 and HiCNN if the goal is to achieve higher reproducibility scores between the enhanced Hi-C matrix and the real high-resolution Hi-C matrix.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.