Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools.
Approximate Bayesian computation (ABC) techniques have seen rapid and accelerating development in biology, with applications including population genetics, systems biology, and community ecology (reviewed in Beaumont 2010;Csilléry et al. 2010). However, the approximations and model assumptions inherent in ABC can make model choice and parameter estimation problematic, and careful simulation-based validation and assessment of posterior predictive power are required (Gelman et al.
Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels (ρ>0.99 in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated (ρ=0.95). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time (ρ=0.64), and mRNA and ribosome profiling footprint-based estimates of gene expression (ρ=0.53−0.74) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid’s “optimal” codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.
BackgroundAny method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference.MethodsHere, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements.Results and conclusionsDuring the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.
Local translation is vital to polarized cells such as neurons and requires a precise and robust distribution of different mRNAs and the translation machinery across the entire cell. The underlying mechanisms are poorly understood and important players are still to be identified. Here, we discovered a novel Rab5 effector complex which leads to mental retardation when genetically disrupted. The Five-subunit Endosomal Rab5 and RNA/ribosome intermediarY, FERRY complex localizes to early endosomes and associates with the translation machinery and a subset of mRNAs including mRNAs for mitochondrial proteins. It directly interacts with mRNA, thereby exhibiting different binding efficacies. Deletion of FERRY subunits reduces the endosomal localization of transcripts, indicating a role in mRNA distribution. Accordingly, FERRY-positive early endosomes harboring mRNA encoding mitochondrial proteins were observed in close proximity to mitochondria in neurons. Therefore, the FERRY complex plays a role for mRNA localization by linking early endosomes with the translation machinery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.