BackgroundRNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition ‘code’ that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction.ResultsWe provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However...
Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.
The Protein–RNA Interface Database (PRIDB) is a comprehensive database of protein–RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein–RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein–RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein–RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein–RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.
Repeated spillovers of the H1N1 pandemic virus (H1N1pdm09) from humans to pigs resulted in substantial evolution of influenza A viruses infecting swine, contributing to the genetic and antigenic diversity of influenza A viruses (IAV) currently circulating in swine. The reassortment with endemic swine viruses and maintenance of some of the H1N1pdm09 internal genes resulted in the circulation of different genomic constellations in pigs. Here, we performed a whole-genome phylogenetic analysis of 368 IAV circulating in swine from 2009 to 2016 in the United States. We identified 44 different genotypes, with the most common genotype (32.33%) containing a clade IV-A HA gene, a 2002-lineage NA gene, an M-pdm09 gene, and remaining gene segments of triple reassortant internal gene (TRIG) origin. To understand how different genetic constellations may relate to viral fitness, we compared the pathogenesis and transmission in pigs of six representative genotypes. Although all six genotypes efficiently infected pigs, they resulted in different degrees of pathology and viral shedding. These results highlight the vast H3N2 genetic diversity circulating in U.S. swine after 2009. This diversity has important implications in the control of this disease by the swine industry, as well as a potential risk for public health if swine-adapted viruses with H1N1pdm09 genes have an increased risk to humans, as occurred in the 2011-2012 and 2016 human variant H3N2v cases associated with exhibition swine. IMPORTANCE People continue to spread the 2009 H1N1 pandemic (H1N1pdm09) IAV to pigs, allowing H1N1pdm09 to reassort with endemic swine IAV. In this study, we determined the 8 gene combinations of swine H3N2 IAV detected from 2009 to 2016. We identified 44 different genotypes of H3N2, the majority of which contained at least one H1N1pdm09 gene segment. We compared six representative genotypes of H3N2 in pigs. All six genotypes efficiently infected pigs, but they resulted in different degrees of lung damage and viral shedding. These results highlight the vast genetic diversity of H3N2 circulating in U.S. swine after 2009, with important implications for the control of IAV for the swine industry. Because H1N1pdm09 is also highly adapted to humans, these swine viruses pose a potential risk to public health if swine-adapted viruses with H1N1pdm09 genes also have an increased risk for human infection.
These data suggest that vaccine composition and control efforts should consider IAV diversity within swine production regions in addition to aggregated national patterns. This article is protected by copyright. All rights reserved.
In 2016, a total of 18 human infections with influenza A(H3N2) virus occurred after exposure to influenza-infected swine at 7 agricultural fairs. Sixteen of these cases were the result of infection by a reassorted virus with increasing prevalence among US swine containing a hemagglutinin gene from 2010–11 human seasonal H3N2 strains.
H4Nx viruses were reported in swine in Canada and China, but had not been recognized in swine in the USA. In late 2015, an avian-origin H4N6 influenza A virus was isolated from pigs in the United States during a routine diagnostic investigation of clinical respiratory disease in the herd. Serological analysis from additional pigs at the farm and other pigs within the swine production system indicated that the virus did not efficiently transmit from pig to pig and the mode of transmission to swine could not be determined. The isolate was characterized at the molecular level and the pathogenesis and transmission was experimentally evaluated in pigs. Although the virus replicated in the lungs of pigs and caused mild pulmonary lesions, there was no evidence of replication in the upper respiratory tract or transmission to indirect contacts, supporting the findings on the farm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.