Motivation Sequence-based protein–protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. Results We present an end-to-end framework, PIPR (Protein–Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short. Availability and implementation The implementation is available at https://github.com/muhaochen/seq_ppi.git. Supplementary information Supplementary data are available at Bioinformatics online.
There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.
Bud formation is an adaptive trait that temperate forest trees have acquired to facilitate seasonal synchronization. We have characterized transcriptome-level changes that occur during bud formation of white spruce [Picea glauca (Moench) Voss], a primarily determinate species in which preformed stem units contained within the apical bud constitute most of next season's growth. Microarray analysis identified 4460 differentially expressed sequences in shoot tips during short day-induced bud formation. Cluster analysis revealed distinct temporal patterns of expression, and functional classification of genes in these clusters implied molecular processes that coincide with anatomical changes occurring in the developing bud. Comparing expression profiles in developing buds under long day and short day conditions identified possible photoperiod-responsive genes that may not be essential for bud development. Several genes putatively associated with hormone signalling were identified, and hormone quantification revealed distinct profiles for abscisic acid (ABA), cytokinins, auxin and their metabolites that can be related to morphological changes to the bud. Comparison of gene expression profiles during bud formation in different tissues revealed 108 genes that are differentially expressed only in developing buds and show greater transcript abundance in developing buds than other tissues. These findings provide a temporal roadmap of bud formation in white spruce.
Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R = 0.85, p = 2.2 × 10), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.
In the autumn, stems of woody perennials such as forest trees undergo a transition from active growth to dormancy. We used microarray transcriptomic profiling in combination with a proteomics analysis to elucidate processes that occur during this growth-to-dormancy transition in a conifer, white spruce (Picea glauca [Moench] Voss). Several differentially expressed genes were likely associated with the developmental transition that occurs during growth cessation in the cambial zone and the concomitant completion of cell maturation in vascular tissues. Genes encoding for cell wall and membrane biosynthetic enzymes showed transcript abundance patterns consistent with completion of cell maturation, and also of cell wall and membrane modifications potentially enabling cells to withstand the harsh conditions of winter. Several differentially expressed genes were identified that encoded putative regulators of cambial activity, cell development and of the photoperiodic pathway. Reconfiguration of carbon allocation figured centrally in the tree's overwintering preparations. For example, genes associated with carbon-based defences such as terpenoids were down-regulated, while many genes associated with protein-based defences and other stress mitigation mechanisms were up-regulated. Several of these correspond to proteins that were accumulated during the growth-to-dormancy transition, emphasizing the importance of stress protection in the tree's adaptive response to overwintering.
Colorado potato beetle (CPB) is a leading pest of solanaceous plants. Despite the economic importance of this pest, surprisingly few studies have been carried out to characterize its molecular interaction with the potato plant. In particular, little is known about the effect of CPB elicitors on gene expression associated with the plant's defense response. In order to discover putative CPB elicitor-responsive genes, the TIGR 11,421 EST Solanaceae microarray was used to identify genes that are differentially expressed in response to the addition of CPB regurgitant to wounded potato leaves. By applying a cutoff corresponding to an adjusted P-value of <0.01 and a fold change of >1.5 or <0.67, we found that 73 of these genes are induced by regurgitant treatment of wounded leaves when compared to wounding alone, whereas 54 genes are repressed by this treatment. This gene set likely includes regurgitant-responsive genes as well as wounding-responsive genes whose expression patterns are further enhanced by the presence of regurgitant. Real-time polymerase chain reaction was used to validate differential expression by regurgitant treatment for five of these genes. In general, genes that encoded proteins involved in secondary metabolism and stress were induced by regurgitant; genes associated with photosynthesis were repressed. One induced gene that encodes aromatic amino acid decarboxylase is responsible for synthesis of the precursor of 2-phenylethanol. This is significant because 2-phenylethanol is recognized by the CPB predator Perillus bioculatis. In addition, three of the 16 type 1 and type 2 proteinase inhibitor clones present on the potato microarray were repressed by application of CPB regurgitant to wounded leaves. Given that proteinase inhibitors are known to interfere with digestion of proteins in the insect midgut, repression of these proteinase inhibitors by CPB may inhibit this component of the plant's defense arsenal. These data suggest that beyond the wound response, CPB elicitors play a role in mediating the plant/insect interaction.
The functional impact of protein mutations is reflected on the alteration of conformation and thermodynamics of protein–protein interactions (PPIs). Quantifying the changes of two interacting proteins upon mutations is commonly carried out by computational approaches. Hence, extensive research efforts have been put to the extraction of energetic or structural features on proteins, followed by statistical learning methods to estimate the effects of mutations on PPI properties. Nonetheless, such features require extensive human labors and expert knowledge to obtain, and have limited abilities to reflect point mutations. We present an end-to-end deep learning framework, MuPIPR (Mutation Effects in Protein–protein Interaction PRediction Using Contextualized Representations), to estimate the effects of mutations on PPIs. MuPIPR incorporates a contextualized representation mechanism of amino acids to propagate the effects of a point mutation to surrounding amino acid representations, therefore amplifying the subtle change in a long protein sequence. On top of that, MuPIPR leverages a Siamese residual recurrent convolutional neural encoder to encode a wild-type protein pair and its mutation pair. Multi-layer perceptron regressors are applied to the protein pair representations to predict the quantifiable changes of PPI properties upon mutations. Experimental evaluations show that, with only sequence information, MuPIPR outperforms various state-of-the-art systems on estimating the changes of binding affinity for SKEMPI v1, and offers comparable performance on SKEMPI v2. Meanwhile, MuPIPR also demonstrates state-of-the-art performance on estimating the changes of buried surface areas. The software implementation is available at https://github.com/guangyu-zhou/MuPIPR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.