Lysine succinylation is an emerging protein post-translational modification, which plays an important role in regulating the cellular processes in both eukaryotic and prokaryotic cells. However, the succinylation modification site is particularly difficult to detect because the experimental technologies used are often time-consuming and costly. Thus, an accurate computational method for predicting succinylation sites may help researchers towards designing their experiments and to understand the molecular mechanism of succinylation. In this study, a novel computational tool termed SuccinSite has been developed to predict protein succinylation sites by incorporating three sequence encodings, i.e., k-spaced amino acid pairs, binary and amino acid index properties. Then, the random forest classifier was trained with these encodings to build the predictor. The SuccinSite predictor achieves an AUC score of 0.802 in the 5-fold cross-validation set and performs significantly better than existing predictors on a comprehensive independent test set. Furthermore, informative features and predominant rules (i.e. feature combinations) were extracted from the trained random forest model for an improved interpretation of the predictor. Finally, we also compiled a database covering 4411 experimentally verified succinylation proteins with 12 456 lysine succinylation sites. Taken together, these results suggest that SuccinSite would be a helpful computational resource for succinylation sites prediction. The web-server, datasets, source code and database are freely available at http://systbio.cau.edu.cn/SuccinSite/.
Background Gene expression is a key determinant of cellular response. Natural variation in gene expression bridges genetic variation to phenotypic alteration. Identification of the regulatory variants controlling the gene expression in response to drought, a major environmental threat of crop production worldwide, is of great value for drought-tolerant gene identification. Results A total of 627 RNA-seq analyses are performed for 224 maize accessions which represent a wide genetic diversity under three water regimes; 73,573 eQTLs are detected for about 30,000 expressing genes with high-density genome-wide single nucleotide polymorphisms, reflecting a comprehensive and dynamic genetic architecture of gene expression in response to drought. The regulatory variants controlling the gene expression constitutively or drought-dynamically are unraveled. Focusing on dynamic regulatory variants resolved to genes encoding transcription factors, a drought-responsive network reflecting a hierarchy of transcription factors and their target genes is built. Moreover, 97 genes are prioritized to associate with drought tolerance due to their expression variations through the Mendelian randomization analysis. One of the candidate genes, Abscisic acid 8′-hydroxylase, is verified to play a negative role in plant drought tolerance. Conclusions This study unravels the effects of genetic variants on gene expression dynamics in drought response which allows us to better understand the role of distal and proximal genetic effects on gene expression and phenotypic plasticity. The prioritized drought-associated genes may serve as direct targets for functional investigation or allelic mining.
Computational analysis of human-virus protein-protein interaction (PPI) data is an effective way toward systems understanding the molecular mechanism of viral infection. Previous work has mainly focused on characterizing the global properties of viral targets within the entire human PPI network. In comparison, how viruses manipulate host local networks (e.g., human protein complexes) has been rarely addressed from a computational perspective. By mainly integrating information about human-virus PPIs, human protein complexes, and gene expression profiles, we performed a large-scale analysis of virally targeted complexes (VTCs) related to five common human-pathogenic viruses, including influenza A virus subtype H1N1, human immunodeficiency virus type 1, Epstein-Barr virus, human papillomavirus, and hepatitis C virus. We found that viral targets are enriched within human protein complexes. We observed in the context of VTCs that viral targets tended to have a high within-complex degree and to be scaffold and housekeeping proteins. Complexes that are essential for viral propagation were simultaneously targeted by multiple viruses. We characterized the periodic expression patterns of VTCs and provided the corresponding candidates that may be involved in the manipulation of the host cell cycle. As a potential application of the current analysis, we proposed a VTC-based antiviral drug target discovery strategy. Finally, we developed an online VTC-related platform known as VTcomplex (http://zzdlab.com/vtcomplex/index.php or http://systbio.cau.edu.cn/vtcomplex/index.php). We hope that the current analysis can provide new insights into the global landscape of human-virus PPIs at the VTC level and that the developed VTcomplex will become a vital resource for the community. IMPORTANCE Although human protein complexes have been reported to be directly related to viral infection, previous studies have not systematically investigated human-virus PPIs from the perspective of human protein complexes. To the best of our knowledge, we have presented here the most comprehensive and in-depth analysis of human-virus PPIs in the context of VTCs. Our findings confirm that human protein complexes are heavily involved in viral infection. The observed preferences of virally targeted subunits within complexes reflect the mechanisms used by viruses to manipulate host protein complexes. The identified periodic expression patterns of the VTCs and the corresponding candidates could increase our understanding of how viruses manipulate the host cell cycle. Finally, our proposed conceptual application framework of VTCs and the developed VTcomplex could provide new hints to develop antiviral drugs for the clinical treatment of viral infections.
Motivation To complement experimental efforts, machine learning-based computational methods are playing an increasingly important role to predict human-virus protein-protein interactions (PPIs). Furthermore, transfer learning can effectively apply prior knowledge obtained from a large source dataset/task to a small target dataset/task, improving prediction performance. Results To predict interactions between human and viral proteins, we combine evolutionary sequence profile features with a Siamese convolutional neural network (CNN) architecture and a multi-layer perceptron. Our architecture outperforms various feature encodings-based machine learning and state-of-the-art prediction methods. As our main contribution, we introduce two transfer learning methods (i.e., ‘frozen’ type and ‘fine-tuning’ type) that reliably predict interactions in a target human-virus domain based on training in a source human-virus domain, by retraining CNN layers. Finally, we utilize the ‘frozen’ type transfer learning approach to predict human-SARS-CoV-2 PPIs, indicating that our predictions are topologically and functionally similar to experimentally known interactions. Supplementary information Supplementary data are available at Bioinformatics online.
While leading to millions of people’s deaths every year the treatment of viral infectious diseases remains a huge public health challenge.Therefore, an in-depth understanding of human–virus protein–protein interactions (PPIs) as the molecular interface between a virus and its host cell is of paramount importance to obtain new insights into the pathogenesis of viral infections and development of antiviral therapeutic treatments. However, current human–virus PPI database resources are incomplete, lack annotation and usually do not provide the opportunity to computationally predict human–virus PPIs. Here, we present the Human–Virus Interaction DataBase (HVIDB, http://zzdlab.com/hvidb/) that provides comprehensively annotated human–virus PPI data as well as seamlessly integrates online PPI prediction tools. Currently, HVIDB highlights 48 643 experimentally verified human–virus PPIs covering 35 virus families, 6633 virally targeted host complexes, 3572 host dependency/restriction factors as well as 911 experimentally verified/predicted 3D complex structures of human–virus PPIs. Furthermore, our database resource provides tissue-specific expression profiles of 6790 human genes that are targeted by viruses and 129 Gene Expression Omnibus series of differentially expressed genes post-viral infections. Based on these multifaceted and annotated data, our database allows the users to easily obtain reliable information about PPIs of various human viruses and conduct an in-depth analysis of their inherent biological significance. In particular, HVIDB also integrates well-performing machine learning models to predict interactions between the human host and viral proteins that are based on (i) sequence embedding techniques, (ii) interolog mapping and (iii) domain–domain interaction inference. We anticipate that HVIDB will serve as a one-stop knowledge base to further guide hypothesis-driven experimental efforts to investigate human–virus relationships.
The identification of plant-pathogen protein-protein interactions (PPIs) is an attractive and challenging research topic for deciphering the complex molecular mechanism of plant immunity and pathogen infection. Considering that the experimental identification of plant-pathogen PPIs is time-consuming and labor-intensive, computational methods are emerging as an important strategy to complement the experimental methods. In this work, we first evaluated the performance of traditional computational methods such as interolog, domain-domain interaction and domain-motif interaction in predicting known plant-pathogen PPIs. Owing to the low sensitivity of the traditional methods, we utilized Random Forest to build an inter-species PPI prediction model based on multiple sequence encodings and novel network attributes in the established plant PPI network. Critical assessment of the features demonstrated that the integration of sequence information and network attributes resulted in significant and robust performance improvement. Additionally, we also discussed the influence of Gene Ontology and gene expression information on the prediction performance. The Web server implementing the integrated prediction method, named InterSPPI, has been made freely available at http://systbio.cau.edu.cn/intersppi/index.php. InterSPPI could achieve a reasonably high accuracy with a precision of 73.8% and a recall of 76.6% in the independent test. To examine the applicability of InterSPPI, we also conducted cross-species and proteome-wide plant-pathogen PPI prediction tests. Taken together, we hope this work can provide a comprehensive understanding of the current status of plant-pathogen PPI predictions, and the proposed InterSPPI can become a useful tool to accelerate the exploration of plant-pathogen interactions.
Protein self-interaction, i.e. the interaction between two or more identical proteins expressed by one gene, plays an important role in the regulation of cellular functions. Considering the limitations of experimental self-interaction identification, it is necessary to design specific bioinformatics tools for self-interacting protein (SIP) prediction from protein sequence information. In this study, we proposed an improved computational approach for SIP prediction, termed SPAR (Self-interacting Protein Analysis serveR). Firstly, we developed an improved encoding scheme named critical residues substitution (CRS), in which the fine-grained domain-domain interaction information was taken into account. Then, by employing the Random Forest algorithm, the performance of CRS was evaluated and compared with several other encoding schemes commonly used for sequence-based protein-protein interaction prediction. Through the tenfold cross-validation tests on a balanced training dataset, CRS performed the best, with the average accuracy up to 72.01 %. We further integrated CRS with other encoding schemes and identified the most important features using the mRMR (the minimum redundancy maximum relevance) feature selection method. Our SPAR model with selected features achieved an average accuracy of 92.09 % on the human-independent test set (the ratio of positives to negatives was about 1:11). Besides, we also evaluated the performance of SPAR on an independent yeast test set (the ratio of positives to negatives was about 1:8) and obtained an average accuracy of 76.96 %. The results demonstrate that SPAR is capable of achieving a reasonable performance in cross-species application. The SPAR server is freely available for academic use at http://systbio.cau.edu.cn/zzdlab/spar/ .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.