BackgroundSeveral computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species.ResultsWe developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM) model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV) and hepatitis C virus (HCV), our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO) annotations of proteins, we predicted new interactions between virus proteins and human proteins.ConclusionsEncoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1) it enables a prediction model to achieve a better performance than other representations, (2) it generates feature vectors of fixed length regardless of the sequence length, and (3) the same representation is applicable to different types of proteins.
We propose a sequence-based multiple classifier system, i.e., rotation forest, to infer protein-protein interactions (PPIs). Moreover, Moran autocorrelation descriptor is used to code an interaction protein pair. Experimental results on Saccharomyces cerevisiae and Helicobacter pylori datasets show that our approach outperforms those previously published in literature, which demonstrates the effectiveness of the proposed method.
BackgroundViral infection involves a large number of protein-protein interactions (PPIs) between virus and its host. These interactions range from the initial binding of viral coat proteins to host membrane receptor to the hijacking the host transcription machinery by viral proteins. Therefore, identifying PPIs between virus and its host helps understand the mechanism of viral infections and design antiviral drugs. Many computational methods have been developed to predict PPIs, but most of them are intended for PPIs within a species rather than PPIs across different species such as PPIs between virus and host.ResultsIn this study, we developed a prediction model of virus-host PPIs, which is applicable to new viruses and hosts. We tested the prediction model on independent datasets of virus-host PPIs, which were not used in training the model. Despite a low sequence similarity between proteins in training datasets and target proteins in test datasets, the prediction model showed a high performance comparable to the best performance of other methods for single virus-host PPIs.ConclusionsOur method will be particularly useful to find PPIs between host and new viruses for which little information is available. The program and support data are available at http://bclab.inha.ac.kr/VirusHostPPI.
Structural analysis of protein^RNA complexes is labor-intensive, yet provides insight into the interaction patterns between a protein and RNA. As the number of protein^RNA complex structures reported has increased substantially in the last few years, a systematic method is required for automatically identifying interaction patterns. This paper presents a computational analysis of the hydrogen bonds in the most representative set of protein^RNA complexes. The analysis revealed several interesting interaction patterns. (1) While residues in the L L-sheets favored unpaired nucleotides, residues in the helices showed no preference and residues in turns favored paired nucleotides. (2) The backbone hydrogen bonds were more dominant than the base hydrogen bonds in the paired nucleotides, but the reverse was observed in the unpaired nucleotides. (3) The protein^RNA complexes contained more paired nucleotides than unpaired nucleotides, but the unpaired nucleotides were observed more frequently interacting with the proteins. And (4) Arg^U, Thr^A, Lys^A, and Asn^U were the most frequently observed pairs. The interaction patterns discovered from the analysis will provide us with useful information in predicting the structure of the RNA binding protein and the structure of the protein binding RNA. ß
Visualizing RNA secondary structures and pseudoknot structures is essential to bioinformatics systems that deal with RNA structures. However, many bioinformatics systems use heterogeneous data structures and incompatible software components, so integration of software components (including a visualization component) into a system can be hindered by incompatibilities between the components of the system. This paper presents an XML web service and web application program for visualizing RNA secondary structures with pseudoknots. Experimental results show that the PseudoViewer web service and web application are useful for resolving many problems with incompatible software components as well as for visualizing large-scale RNA secondary structures with pseudoknots of any type. The web service and web application are available at .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.