Background:
The process of nucleotides modification or methyl groups addition to nucleotides is known as post-transcriptional modification (PTM). 1-methyladenosine (m1A) is a type of PTM formed by adding a methyl group to the nitrogen at the 1st position of the adenosine base. Many human disorders are associated with m1A, which is widely found in ribosomal RNA and transfer RNA.
Objective:
The conventional methods such as mass spectrometry and site-directed mutagenesis proved to be laborious and burdensome. Systematic identification of modified sites from RNA sequences is gaining much attention nowadays. Consequently, an extreme gradient boost predictor, m1A-Pred, is developed in this study for the prediction of modified m1A sites.
Method:
The current study involves the extraction of position and composition-based properties within nucleotide sequences. The extraction of features helps in the development of the features vector. Statistical moments were endorsed for dimensionality reduction in the obtained features.
Results:
Through a series of experiments using different computational models and evaluation methods, it was revealed that the proposed predictor, m1A-pred, proved to be the most robust and accurate model for the identification of modified sites.
Availability and Implementation:
To enhance the research on m1A sites, a friendly server was also developed which was the final phase of this research.
Background
Dihydrouridine (D) is a modified transfer RNA post-transcriptional modification (PTM) that occurs abundantly in bacteria, eukaryotes, and archaea. The D modification assists in the stability and conformational flexibility of tRNA. The D modification is also responsible for pulmonary carcinogenesis in humans.
Objective
For the detection of D sites, mass spectrometry and site-directed mutagenesis have been developed. However, both are labor-intensive and time-consuming methods. The availability of sequence data has provided the opportunity to build computational models for enhancing the identification of D sites. Based on the sequence data, the DHU-Pred model was proposed in this study to find possible D sites.
Methodology
The model was built by employing comprehensive machine learning and feature extraction approaches. It was then validated using in-demand evaluation metrics and rigorous experimentation and testing approaches.
Results
The DHU-Pred revealed an accuracy score of 96.9%, which was considerably higher compared to the existing D site predictors.
Availability and Implementation
A user-friendly web server for the proposed model was also developed and is freely available for the researchers.
Background Dihydrouridine (D) is one of the most significant uridine modifications that have a prominent occurrence in eukaryotes. The folding and conformational flexibility of transfer RNA (tRNA) can be attained through this modification. Objective The modification also triggers lung cancer in humans. The identification of D sites was carried out through conventional laboratory methods; however, those were costly and time-consuming. The readiness of RNA sequences helps in the identification of D sites through computationally intelligent models. However, the most challenging part is turning these biological sequences into distinct vectors. Methods The current research proposed novel feature extraction mechanisms and the identification of D sites in tRNA sequences using ensemble models. The ensemble models were then subjected to evaluation using k-fold cross-validation and independent testing. Results The results revealed that the stacking ensemble model outperformed all the ensemble models by revealing 0.98 accuracy, 0.98 specificity, 0.97 sensitivity, and 0.92 Matthews Correlation Coefficient. The proposed model, iDHU-Ensem, was also compared with pre-existing predictors using an independent test. The accuracy scores have shown that the proposed model in this research study performed better than the available predictors. Conclusion The current research contributed towards the enhancement of D site identification capabilities through computationally intelligent methods. A web-based server, iDHU-Ensem, was also made available for the researchers at https://taseersuleman-idhu-ensem-idhu-ensem.streamlit.app/ .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.