In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI). In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods.
Cellular processes are significantly influenced by the interactions between different RNAs and proteins within cells. This interaction is crucial in understanding gene expressions and gene regulations, and their role in various diseases. Empirical and experimental methods to study this interaction are hampered by the high cost and combinatorial nature of the problem. Consequently, computer science and machine learning methods were applied to predict the interaction between RNAs and proteins. RNAs are sequences of nucleotides, while proteins are sequences of amino acids. The protein secondary structure describes how amino acids are positioned in three dimensional space. Early methods predicted the interaction between RNA and protein using only sequence information. Recent methods have shown the significance of secondary structure in understanding RNA-Protein interactions. In this thesis, we explore prediction models for RNA-Protein interaction using two different schemes. The first applied string algorithms to extract the most effective string patterns from both sequences and secondary structures. This method resulted in a 93.39% prediction accuracy. The second method used a feature-based approach by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information resulting in a 94.77% accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.