Rasna R. Walia scite author profile

BackgroundRNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition ‘code’ that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction.ResultsWe provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However...

show abstract

RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins

Walia

et al. 2014

View full text Add to dashboard Cite

Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.

show abstract

PRIDB: a protein-RNA interface database

Lewis¹,

Walia²,

Terribilini³

et al. 2010

Nucleic Acids Research

120

104

View full text Add to dashboard Cite

The Protein–RNA Interface Database (PRIDB) is a comprehensive database of protein–RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein–RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein–RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein–RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein–RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.

show abstract

Reassortment between Swine H3N2 and 2009 Pandemic H1N1 in the United States Resulted in Influenza A Viruses with Diverse Genetic Constellations with Variable Virulence in Pigs

Rajão

Walia

Campbell

et al. 2017

J Virol

View full text Add to dashboard Cite

Repeated spillovers of the H1N1 pandemic virus (H1N1pdm09) from humans to pigs resulted in substantial evolution of influenza A viruses infecting swine, contributing to the genetic and antigenic diversity of influenza A viruses (IAV) currently circulating in swine. The reassortment with endemic swine viruses and maintenance of some of the H1N1pdm09 internal genes resulted in the circulation of different genomic constellations in pigs. Here, we performed a whole-genome phylogenetic analysis of 368 IAV circulating in swine from 2009 to 2016 in the United States. We identified 44 different genotypes, with the most common genotype (32.33%) containing a clade IV-A HA gene, a 2002-lineage NA gene, an M-pdm09 gene, and remaining gene segments of triple reassortant internal gene (TRIG) origin. To understand how different genetic constellations may relate to viral fitness, we compared the pathogenesis and transmission in pigs of six representative genotypes. Although all six genotypes efficiently infected pigs, they resulted in different degrees of pathology and viral shedding. These results highlight the vast H3N2 genetic diversity circulating in U.S. swine after 2009. This diversity has important implications in the control of this disease by the swine industry, as well as a potential risk for public health if swine-adapted viruses with H1N1pdm09 genes have an increased risk to humans, as occurred in the 2011-2012 and 2016 human variant H3N2v cases associated with exhibition swine. IMPORTANCE People continue to spread the 2009 H1N1 pandemic (H1N1pdm09) IAV to pigs, allowing H1N1pdm09 to reassort with endemic swine IAV. In this study, we determined the 8 gene combinations of swine H3N2 IAV detected from 2009 to 2016. We identified 44 different genotypes of H3N2, the majority of which contained at least one H1N1pdm09 gene segment. We compared six representative genotypes of H3N2 in pigs. All six genotypes efficiently infected pigs, but they resulted in different degrees of lung damage and viral shedding. These results highlight the vast genetic diversity of H3N2 circulating in U.S. swine after 2009, with important implications for the control of IAV for the swine industry. Because H1N1pdm09 is also highly adapted to humans, these swine viruses pose a potential risk to public health if swine-adapted viruses with H1N1pdm09 genes also have an increased risk for human infection.

show abstract

Influenza A virus vaccines for swine

Vincent

Pérez

Rajão

et al. 2017

Veterinary Microbiology

View full text Add to dashboard Cite

Regional patterns of genetic diversity in swine influenza A viruses in the United States from 2010 to 2016

Walia

Anderson

Vincent

2019

Influenza Resp Viruses

View full text Add to dashboard Cite

show abstract

Influenza A(H3N2) Virus in Swine at Agricultural Fairs and Transmission to Humans, Michigan and Ohio, USA, 2016

Bowman¹,

Walia²,

Nolting³

et al. 2017

Emerg. Infect. Dis.

View full text Add to dashboard Cite

show abstract

Detection and characterization of an H4N6 avian-lineage influenza A virus in pigs in the Midwestern United States

et al. 2017

View full text Add to dashboard Cite

H4Nx viruses were reported in swine in Canada and China, but had not been recognized in swine in the USA. In late 2015, an avian-origin H4N6 influenza A virus was isolated from pigs in the United States during a routine diagnostic investigation of clinical respiratory disease in the herd. Serological analysis from additional pigs at the farm and other pigs within the swine production system indicated that the virus did not efficiently transmit from pig to pig and the mode of transmission to swine could not be determined. The isolate was characterized at the molecular level and the pathogenesis and transmission was experimentally evaluated in pigs. Although the virus replicated in the lungs of pigs and caused mild pulmonary lesions, there was no evidence of replication in the upper respiratory tract or transmission to indirect contacts, supporting the findings on the farm.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rasna R. Walia

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art

RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins

PRIDB: a protein-RNA interface database

Reassortment between Swine H3N2 and 2009 Pandemic H1N1 in the United States Resulted in Influenza A Viruses with Diverse Genetic Constellations with Variable Virulence in Pigs

Influenza A virus vaccines for swine

Regional patterns of genetic diversity in swine influenza A viruses in the United States from 2010 to 2016

Influenza A(H3N2) Virus in Swine at Agricultural Fairs and Transmission to Humans, Michigan and Ohio, USA, 2016

Detection and characterization of an H4N6 avian-lineage influenza A virus in pigs in the Midwestern United States

Contact Info

Product

Resources

About