Motivation Over the last decade, RNA-Seq whole-genome sequencing has become a widely used method for measuring and understanding transcriptome-level changes in gene expression. Since RNA-Seq is relatively inexpensive, it can be used on multiple genomes to evaluate gene expression across many different conditions, tissues, and cell types. Although many tools exist to map and compare RNA-Seq at the genomics level, few web-based tools are dedicated to making data generated for individual genomic analysis accessible and reusable at a gene-level scale for comparative analysis between genes, across different genomes, and meta-analyses. Results To address this challenge, we revamped the comparative gene expression tool qTeller to take advantage of the growing number of public RNA-Seq datasets. qTeller allows users to evaluate gene expression data in a defined genomic interval and also perform two-gene comparisons across multiple user-chosen tissues. Though previously unpublished, qTeller has been cited extensively in the scientific literature, demonstrating its importance to researchers. Our new version of qTeller now supports multiple genomes for intergenomic comparisons, and includes capabilities for both mRNA and protein abundance datasets. Other new features include support for additional data formats, modernized interface and back-end database, and an optimized framework for adoption by other organisms’ databases. Availability The source code for qTeller is open-source and available through GitHub (https://github.com/Maize-Genetics-and-Genomics-Database/qTeller). A maize instance of qTeller is available at the Maize Genetics and Genomics database (MaizeGDB) (https://qteller.maizegdb.org/), where we have mapped over 200 unique datasets from GenBank across 27 maize genomes. Supplementary information Supplementary data are available at Bioinformatics online.
Protein structures play an important role in bioinformatics, such as in predicting gene function or validating gene model annotation. However, determining protein structure was, until now, costly and time-consuming, which resulted in a structural biology bottleneck. With the release of such programs AlphaFold and ESMFold, this bottleneck has been reduced by several orders of magnitude, permitting protein structural comparisons of entire genomes within reasonable timeframes. MaizeGDB has leveraged this technological breakthrough by offering several new tools to accelerate protein structural comparisons between maize and other plants as well as human and yeast outgroups. MaizeGDB also offers bulk downloads of these comparative protein structure data, along with predicted functional annotation information. In this way, MaizeGDB is poised to assist maize researchers in assessing functional homology, gene model annotation quality, and other information unavailable to maize scientists even a few years ago.
Methods to predict orthology play an important role in bioinformatics for phylogenetic analysis by identifying orthologs within or across any level of biological classification. Sequence-based reciprocal best hit approaches are commonly used in functional annotation since orthologous genes are expected to share functions. The process is limited as it relies solely on sequence data and does not consider structural information and its role in function. Previously, determining protein structure was highly time-consuming, inaccurate, and limited to the size of the protein, all of which resulted in a structural biology bottleneck. With the release of AlphaFold, there are now over 200 million predicted protein structures, including full proteomes for dozens of key organisms. The reciprocal best structural hit approach uses protein structure alignments to identify structural orthologs. We propose combining both sequence- and structure-based reciprocal best hit approaches to obtain a more accurate and complete set of orthologs across diverse species, called Functional Annotations using Sequence and Structure Orthology (FASSO). Using FASSO, we annotated orthologs between five plant species (maize, sorghum, rice, soybean, Arabidopsis) and three distance outgroups (human, budding yeast, and fission yeast). We inferred over 270,000 functional annotations across the eight proteomes including annotations for over 5,600 uncharacterized proteins. FASSO provides confidence labels on ortholog predictions and flags potential misannotations in existing proteomes. We further demonstrate the utility of the approach by exploring the annotation of the maize proteome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.