Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as “GTP cyclohydrolase I type 2” through electronic propagation based on one study. Here, the annotation status of this protein family was examined through a comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox, and universal stress responses.
Capturing the published corpus of information on all members of a given protein family should be an essential step in any study focusing on any specific member of that said family. This step is often performed only superficially or partially by experimentalists as the most common approaches and tools to pursue this objective are far from optimal. Using a previously gathered dataset of 284 references mentioning a member of the DUF34 (NIF3/Ngg1-interacting Factor 3), we evaluated the productivity of different databases and search tools, and devised a workflow that can be used by experimentalists to capture the most information in less time. To complement this workflow, web-based platforms allowing for the exploration of member distributions for several protein families across sequenced genomes or for the capture of gene neighborhood information were reviewed for their versatility, completeness and ease of use. Recommendations that can be used for experimentalist users, as well as educators, are provided and integrated within a customized, publicly accessible Wiki.
Dihydrouridine (D) is an abundant modified base found in the tRNAs of most living organisms and was recently detected in eukaryotic mRNAs. This base confers significant conformational plasticity to RNA molecules. The dihydrouridine biosynthetic reaction is catalyzed by a large family of flavoenzymes, the dihydrouridine synthases (Dus). So far, only bacterial Dus enzymes and their complexes with tRNAs have been structurally characterized. Understanding the structure-function relationships of eukaryotic Dus proteins has been hampered by the paucity of structural data. Here, we combined extensive phylogenetic analysis with high-precision 3D molecular modeling of more than 30 Dus2 enzymes selected along the tree of life to determine the evolutionary molecular basis of D biosynthesis by these enzymes. Dus2 is the eukaryotic enzyme responsible for the synthesis of D20 in tRNAs and is involved in some human cancers and in the detoxification of β-amyloid peptides in Alzheimer’s disease. In addition to the domains forming the canonical structure of all Dus, i.e., the catalytic TIM-barrel domain and the helical domain, both participating in RNA recognition in the bacterial Dus, a majority of Dus2 proteins harbor extensions at both ends. While these are mainly unstructured extensions on the N-terminal side, the C-terminal side extensions can adopt well-defined structures such as helices and beta-sheets or even form additional domains such as zinc finger domains. 3D models of Dus2/tRNA complexes were also generated. This study suggests that eukaryotic Dus2 proteins may have an advantage in tRNA recognition over their bacterial counterparts due to their modularity.
Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as “GTP cyclohydrolase I type 2” through electronic propagation based on one study. Here, the annotation status of this protein family was examined through comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox and universal stress responses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.