The promoter region is located near the transcription start sites and regulates transcription initiation of the gene by controlling the binding of RNA polymerase. Thus, promoter region recognition is an important area of interest in the field of bioinformatics. Numerous tools for promoter prediction were proposed. However, the reliability of these tools still needs to be improved. In this work, we propose a robust deep learning model, called DeePromoter, to analyze the characteristics of the short eukaryotic promoter sequences, and accurately recognize the human and mouse promoter sequences. DeePromoter combines a convolutional neural network (CNN) and a long short-term memory (LSTM). Additionally, instead of using non-promoter regions of the genome as a negative set, we derive a more challenging negative set from the promoter sequences. The proposed negative set reconstruction method improves the discrimination ability and significantly reduces the number of false positive predictions. Consequently, DeePromoter outperforms the previously proposed promoter prediction tools. In addition, a web-server for promoter prediction is developed based on the proposed methods and made available at https://home.jbnu.ac.kr/NSCL/deepromoter.htm .
Background Predicting protein-ligand binding sites is a fundamental step in understanding the functional characteristics of proteins, which plays a vital role in elucidating different biological functions and is a crucial step in drug discovery. A protein exhibits its true nature after binding to its interacting molecule known as a ligand that binds only in the favorable binding site of the protein structure. Different computational methods exploiting the features of proteins have been developed to identify the binding sites in the protein structure, but none seems to provide promising results, and therefore, further investigation is required. Results In this study, we present a deep learning model PUResNet and a novel data cleaning process based on structural similarity for predicting protein-ligand binding sites. From the whole scPDB (an annotated database of druggable binding sites extracted from the Protein DataBank) database, 5020 protein structures were selected to address this problem, which were used to train PUResNet. With this, we achieved better and justifiable performance than the existing methods while evaluating two independent sets using distance, volume and proportion metrics.
Pseudouridine is the most prevalent RNA modification and has been found in both eukaryotes and prokaryotes. Currently, pseudouridine has been demonstrated in several kinds of RNAs, such as small nuclear RNA, rRNA, tRNA, mRNA, and small nucleolar RNA. Therefore, its significance to academic research and drug development is understandable. Through biochemical experiments, the pseudouridine site identification has produced good outcomes, but these lab exploratory methods and biochemical processes are expensive and time consuming. Therefore, it is important to introduce efficient methods for identification of pseudouridine sites. In this study, an intelligent method for pseudouridine sites using the deep-learning approach was developed. The proposed prediction model is called iPseU-CNN (identifying pseudouridine by convolutional neural networks). The existing methods used handcrafted features and machine-learning approaches to identify pseudouridine sites. However, the proposed predictor extracts the features of the pseudouridine sites automatically using a convolution neural network model. The iPseU-CNN model yields better outcomes than the current state-of-the-art models in all evaluation parameters. It is thus highly projected that the iPseU-CNN predictor will become a helpful tool for academic research on pseudouridine site prediction of RNA, as well as in drug discovery.
Object detection in very high-resolution (VHR) aerial images is an essential step for a wide range of applications such as military applications, urban planning, and environmental management. Still, it is a challenging task due to the different scales and appearances of the objects. On the other hand, object detection task in VHR aerial images has improved remarkably in recent years due to the achieved advances in convolution neural networks (CNN). Most of the proposed methods depend on a two-stage approach, namely: a region proposal stage and a classification stage such as Faster R-CNN. Even though two-stage approaches outperform the traditional methods, their optimization is not easy and they are not suitable for real-time applications. In this paper, a uniform one-stage model for object detection in VHR aerial images has been proposed. In order to tackle the challenge of different scales, a densely connected feature pyramid network has been proposed by which high-level multi-scale semantic feature maps with high-quality information are prepared for object detection. This work has been evaluated on two publicly available datasets and outperformed the current state-of-the-art results on both in terms of mean average precision (mAP) and computation time.
The epigenetic modification, DNA N4 -methylcytosine(4mC) plays an important role in DNA expression, repair, and replication. It simply plays a crucial role in restriction-modification systems. The better and accurate prediction of 4mC sites in DNA is much-needed work to understand their functional behavior that leads to help in both drug discovery and biomedical research. Therefore, an accurate computational model is required. In this work, we present an efficient one-dimensional convolutional neural network (CNN) model, called 4mCCNN, for 4mc sites identifications in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterraneus, and Geobacter pickeringii. Existing methods were developed by machine learning algorithms for identifying the 4mc sites using handcrafted features, while the proposed model extracts the features of the 4mC sites from DNA sequence automatically using the CNN model. The performance of the proposed model has been evaluated on benchmark datasets and achieved generally better outcomes in identifying 4mc sites as compared to the state-of-the-art predictors. The developed 4mCNN model was constructed in a web server at https://home.jbnu.ac.kr/NSCL/4mCCNN.htm INDEX TERMS Convolutional neural network, DNA methylation, DNA N4 -methylcytosine(4mC), sequence analysis.
Alternative splicing (AS) is a regulated process that takes place during gene expression by which a single gene may code for multiple proteins. This mechanism is controlled by a complex called spliceosome by which certain exons of a gene may be included in or excluded out from the final mRNA produced from that gene. In AS, at least three remarkable signals exist in introns and they are 5' splice site (5'ss), the donor ss where GU nucleotides are more frequently present, 3'ss, the acceptor ss where AG nucleotides are more frequently present, and branch site. Generally, branch point site is located at 20 to 50 nucleotides upstream from the 3'ss. In this paper, we identify the branch point location using a computational model based on deep learning. We propose a hybrid model based on a combination of dilated convolution neural network and recurrent neural network. Integrating additional inputs to the raw RNA sequence has been studied such as conservation, binding energy, and di-nucleotide. The proposed model has been evaluated on two publicly available datasets and outperformed the current state-of-the-art methods. More specifically, the proposed model achieved for the first dataset 97.29% and 67.08% of the area under curve (ROC-AUC) and the area under precision recall curve (prAUC), respectively, for the second dataset 96.86% and 69.62% of ROC-AUC and prAUC, respectively. In addition, pathogenic variants have been studied by the proposed model and agreed with the reported ones biologically. To study RNA branch point selection, an easy-to-use Web server has been established for free access at: https://home.jbnu.ac.kr/NSCL/rnabps.htm. INDEX TERMS Branch point, deep learning, dilatation convolution network, long short-term memory, and splicing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.