Protein structures play an important role in bioinformatics, such as in predicting gene function or validating gene model annotation. However, determining protein structure was, until now, costly and time-consuming, which resulted in a structural biology bottleneck. With the release of such programs AlphaFold and ESMFold, this bottleneck has been reduced by several orders of magnitude, permitting protein structural comparisons of entire genomes within reasonable timeframes. MaizeGDB has leveraged this technological breakthrough by offering several new tools to accelerate protein structural comparisons between maize and other plants as well as human and yeast outgroups. MaizeGDB also offers bulk downloads of these comparative protein structure data, along with predicted functional annotation information. In this way, MaizeGDB is poised to assist maize researchers in assessing functional homology, gene model annotation quality, and other information unavailable to maize scientists even a few years ago.
Methods to predict orthology play an important role in bioinformatics for phylogenetic analysis by identifying orthologs within or across any level of biological classification. Sequence-based reciprocal best hit approaches are commonly used in functional annotation since orthologous genes are expected to share functions. The process is limited as it relies solely on sequence data and does not consider structural information and its role in function. Previously, determining protein structure was highly time-consuming, inaccurate, and limited to the size of the protein, all of which resulted in a structural biology bottleneck. With the release of AlphaFold, there are now over 200 million predicted protein structures, including full proteomes for dozens of key organisms. The reciprocal best structural hit approach uses protein structure alignments to identify structural orthologs. We propose combining both sequence- and structure-based reciprocal best hit approaches to obtain a more accurate and complete set of orthologs across diverse species, called Functional Annotations using Sequence and Structure Orthology (FASSO). Using FASSO, we annotated orthologs between five plant species (maize, sorghum, rice, soybean, Arabidopsis) and three distance outgroups (human, budding yeast, and fission yeast). We inferred over 270,000 functional annotations across the eight proteomes including annotations for over 5,600 uncharacterized proteins. FASSO provides confidence labels on ortholog predictions and flags potential misannotations in existing proteomes. We further demonstrate the utility of the approach by exploring the annotation of the maize proteome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.