Post-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques based on handy craft feature extraction methods have been proposed to facilitate the laborious work. However, due to the inefficient feature extraction, these techniques increase the computational complexity and thereby affect the identification accuracy of m6A.This paper proposes a fast and reliable predictive model for the identification of m6A sites. The proposed model is based on the convolutional neural network (CNN) which extracts the most significant features from the RNA sequences encoded by concatenating one-hot and nucleotide chemical properties. The proposed model is applied and tested on multiple species benchmark datasets and evaluated against the state-of-art predictive models. The results indicate that the proposed model achieves high accuracy of 93.6 %, 93.8 %, 85.0 % and 92.5 % on the datasets of Homo sapiens (H.sapien), Mus musculus (M.musculs), Saccharomyces cerevisiae (S.cerevisiae), and Arabidopsis thaliana (A.thaliana), respectively.The proposed model could be used to facilitate the researcher's community in m6A identification. In addition, an easy to use web server is made available at https://home.jbnu.ac.kr/NSCL/pm6acnn.htm.
N4-acetylcytidine (ac4C) is a post-transcriptional modification in mRNA which plays a major role in the stability and regulation of mRNA translation. The working mechanism of ac4C modification in mRNA is still unclear and traditional laboratory experiments are time-consuming and expensive. Therefore, we propose an XG-ac4C machine learning model based on the eXtreme Gradient Boost classifier for the identification of ac4C sites. The XG-ac4C model uses a combination of electron-ion interaction pseudopotentials and electron-ion interaction pseudopotentials of trinucleotide of the nucleotides in ac4C sites. Moreover, Shapley additive explanations and local interpretable model-agnostic explanations are applied to understand the importance of features and their contribution to the final prediction outcome. The obtained results demonstrate that XG-ac4C outperforms existing state-of-the-art methods. In more detail, the proposed model improves the area under the precision-recall curve by 9.4% and 9.6% in cross-validation and independent tests, respectively. Finally, a user-friendly web server based on the proposed model for ac4C site identification is made freely available at http://nsclbio.jbnu.ac.kr/tools/xgac4c/.
DNA is subject to epigenetic modification by the molecule N4-methylcytosine (4mC). N4-methylcytosine plays a crucial role in DNA repair and replication, protects host DNA from degradation, and regulates DNA expression. However, though current experimental techniques can identify 4mC sites, such techniques are expensive and laborious. Therefore, computational tools that can predict 4mC sites would be very useful for understanding the biological mechanism of this vital type of DNA modification. Conventional machine-learning-based methods rely on hand-crafted features, but the new method saves time and computational cost by making use of learned features instead. In this study, we propose i4mC-Deep, an intelligent predictor based on a convolutional neural network (CNN) that predicts 4mC modification sites in DNA samples. The CNN is capable of automatically extracting important features from input samples during training. Nucleotide chemical properties and nucleotide density, which together represent a DNA sequence, act as CNN input data. The outcome of the proposed method outperforms several state-of-the-art predictors. When i4mC-Deep was used to analyze G. subterruneus DNA, the accuracy of the results was improved by 3.9% and MCC increased by 10.5% compared to a conventional predictor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.