DeepMind presented remarkably accurate predictions at the recent CASP14 protein structure prediction assessment conference. We explored network architectures incorporating related ideas and obtained the best performance with a three-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
During virus infection, the adaptor proteins MAVS and STING transduce signals from the cytosolic nucleic acid sensors RIG-I and cGAS, respectively, to induce type I interferons (IFNs) and other antiviral molecules. Here we show that MAVS and STING harbor two conserved serine and threonine clusters that are phosphorylated by the kinases IKK and/or TBK1 in response to stimulation. Phosphorylated MAVS and STING then bind to a positively charged surface of interferon regulatory factor 3 (IRF3) and thereby recruit IRF3 for its phosphorylation and activation by TBK1. We further show that TRIF, an adaptor protein in Toll-like receptor signaling, activates IRF3 through a similar phosphorylation-dependent mechanism. These results reveal that phosphorylation of innate adaptor proteins is an essential and conserved mechanism that selectively recruits IRF3 to activate the type I IFN pathway.
Deep learning for protein interactions The use of deep learning has revolutionized the field of protein modeling. Humphreys et al . combined this approach with proteome-wide, coevolution-guided protein interaction identification to conduct a large-scale screen of protein-protein interactions in yeast (see the Perspective by Pereira and Schwede). The authors generated predicted interactions and accurate structures for complexes spanning key biological processes in Saccharomyces cerevisiae . The complexes include larger protein assemblies such as trimers, tetramers, and pentamers and provide insights into biological function. —VV
Predicting protein pairs Biological function is driven by interaction between proteins. High-throughput experimental techniques have provided large datasets of protein interactions in several organisms; however, much combinatorial space remains uncharted. Cong et al. predict protein interfaces by identifying coevolving residues in aligned protein sequences (see the Perspective by Vajda and Emili). In comparison with gold-standard and negative control sets, they show that the accuracy is higher than for proteome-wide two-hybrid and mass spectrometry screens. The approach predicts 1618 protein interactions in Escherichia coli , 682 of which were unanticipated, and 911 interacting pairs in Mycobacterium tuberculosis , most of which had not been previously described. With an expected false-positive rate of between 10 and 20%, the predicted interactions and networks provide an excellent starting point for further study. Science , this issue p. 185 ; see also p. 120
For centuries, biologists have used phenotypes to infer evolution. For decades, a handful of gene markers have given us a glimpse of the genotype to combine with phenotypic traits. Today, we can sequence entire genomes from hundreds of species and gain yet closer scrutiny. To illustrate the power of genomics, we have chosen skipper butterflies (Hesperiidae). The genomes of 250 representative species of skippers reveal rampant inconsistencies between their current classification and a genome-based phylogeny. We use a dated genomic tree to define tribes (six new) and subtribes (six new), to overhaul genera (nine new) and subgenera (three new), and to display convergence in wing patterns that fooled researchers for decades. We find that many skippers with similar appearance are distantly related, and several skippers with distinct morphology are close relatives. These conclusions are strongly supported by different genomic regions and are consistent with some morphological traits. Our work is a forerunner to genomic biology shaping biodiversity research.
Predicting phenotype from genotype represents the epitome of biological questions. Comparative genomics of appropriate model organisms holds the promise of making it possible. However, the high heterozygosity of many Eukaryotes currently prohibits assembling their genomes. Here, we report the 376 Mb genome sequence of Papilio glaucus (Pgl), the first sequenced genome from the Papilionidae family. We obtained the genome from a wild-caught specimen using a cost-effective strategy that overcomes the high (2%) heterozygosity problem. Comparative analyses suggest the molecular bases of various phenotypic traits, including terpene production in the Papilionidae-specific organ, osmeterium. Comparison of Pgl and Papilio canadensis transcriptomes reveals mutation hotspots (4% genes) associated with their divergence: four key circadian clock proteins are enriched in inter-species mutations and likely responsible for the difference in pupal diapause. Finally, the Pgl genome confirms Papilio appalachiensis as a hybrid of Pgl and Pca, but suggests it inherited 3/4 of its genes from Pca.
We present an overview of the ninth round of Critical Assessment of Protein Structure Prediction (CASP9) ‘Template free modeling’ category (FM). Prediction models were evaluated using a combination of established structural and sequence comparison measures and a novel automated method designed to mimic manual inspection by capturing both global and local structural features. These scores were compared to those assigned manually over a diverse subset of target domains. Scores were combined to compare overall performance of participating groups and to estimate rank significance. Moreover, we discuss a few examples of free modeling targets to highlight the progress and bottlenecks of current prediction methods. Notably, a server prediction model for a single target (T0581) improved significantly over the closest structure template (44% GDT increase). This accomplishment represents the ‘winner’ of the CASP9 FM category. A number of human expert groups submitted slight variations of this model, highlighting a trend for human experts to act as “meta predictors” by correctly selecting among models produced by the top-performing automated servers. The details of evaluation are available at http://prodata.swmed.edu/CASP9/
The Critical Assessment of Protein Structure Prediction round 9 (CASP9) aimed to evaluate predictions for 129 experimentally determined protein structures. To assess tertiary structure predictions, these target structures were divided into domain-based evaluation units that were then classified into two assessment categories: template based modeling (TBM) and template free modeling (FM). CASP9 targets were split into domains of structurally compact evolutionary modules. For the targets with more than one defined domain, the decision to split structures into domains for evaluation was based on server performance. Target domains were categorized based on their evolutionary relatedness to existing templates as well as their difficulty levels indicated by server performance. Those target domains with sequence-related templates and high server prediction performance were classified as TMB, while those targets without identifiable templates and low server performance were classified as FM. However, using these generalizations for classification resulted in a blurred boundary between CASP9 assessment categories. Thus, the FM category included those domains without sequence detectable templates (25 target domains) as well as some domains with difficult to detect templates whose predictions were as poor as those without templates (5 target domains). Several interesting examples are discussed, including targets with sequence related templates that exhibit unusual structural differences, targets with homologous or analogous structure templates that are not detectable by sequence, and targets with new folds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.