Abstract:Millions of protein database entries are not assigned reliable functions, preventing the full understanding of chemical diversity in living organisms. Here, we describe an integrated strategy for the discovery of various enzymatic activities catalyzed within protein families of unknown or little known function. This approach relies on the definition of a generic reaction conserved within the family, high-throughput enzymatic screening on representatives, structural and modeling investigations and analysis of g… Show more
“…Although these methods outperform historical methods, continued improvement is necessary to ensure accurate annotation of function (2). A greater swath of functional space can be covered by screening substrates in a high-throughput manner on multiple enzymes from a family (3,4). Family-wide substrate profiling offers a data-rich resource.…”
Large-scale activity profiling of enzyme superfamilies provides information about cellular functions as well as the intrinsic binding capabilities of conserved folds. Herein, the functional space of the ubiquitous haloalkanoate dehalogenase superfamily (HADSF) was revealed by screening a customized substrate library against >200 enzymes from representative prokaryotic species, enabling inferred annotation of ∼35% of the HADSF. An extremely high level of substrate ambiguity was revealed, with the majority of HADSF enzymes using more than five substrates. Substrate profiling allowed assignment of function to previously unannotated enzymes with known structure, uncovered potential new pathways, and identified isofunctional orthologs from evolutionarily distant taxonomic groups. Intriguingly, the HADSF subfamily having the least structural elaboration of the Rossmann fold catalytic domain was the most specific, consistent with the concept that domain insertions drive the evolution of new functions and that the broad specificity observed in HADSF may be a relic of this process.evolution | specificity | phosphatase | substrate screen | promiscuity S ince the first genomes were sequenced, there has been an exponential increase in the number of protein sequences deposited into databases worldwide. At the time of this writing the UniProtKB/TrEMBL database contains over 32 million protein sequences. Although this increase in sequence data has dramatically enhanced our understanding of the genomic organization of organisms, as the number of protein sequences grows, the proportion of firm functional assignments diminishes. Traditionally, methods of functional annotation involve comparing sequence identity between experimentally characterized proteins and newly sequenced ones, typically via BLAST (1). In cases where significant sequence similarity cannot be ascertained, proteins are annotated as "hypothetical" or "putative." Moreover, the decrease in sequence identity leads to an increased uncertainty in functional assignment, especially as the phylogenetic distance between organisms grows, limiting iso-functional ortholog discovery.As the number of newly sequenced genomes grows larger, more protein sequences are likely to be misannotated, oftentimes resulting in the propagation of incorrect functional annotation across newly identified sequences. To tackle the problem of unannotated or misannotated proteins, newer methods for computational assignment have been created with varying degrees of success (2). Although these methods outperform historical methods, continued improvement is necessary to ensure accurate annotation of function (2). A greater swath of functional space can be covered by screening substrates in a high-throughput manner on multiple enzymes from a family (3, 4). Family-wide substrate profiling offers a data-rich resource. The use of sparse screening of sequence space and a diversified library permits the determination of substrate specificity profiles to provide a familywide view of the range of substrates...
“…Although these methods outperform historical methods, continued improvement is necessary to ensure accurate annotation of function (2). A greater swath of functional space can be covered by screening substrates in a high-throughput manner on multiple enzymes from a family (3,4). Family-wide substrate profiling offers a data-rich resource.…”
Large-scale activity profiling of enzyme superfamilies provides information about cellular functions as well as the intrinsic binding capabilities of conserved folds. Herein, the functional space of the ubiquitous haloalkanoate dehalogenase superfamily (HADSF) was revealed by screening a customized substrate library against >200 enzymes from representative prokaryotic species, enabling inferred annotation of ∼35% of the HADSF. An extremely high level of substrate ambiguity was revealed, with the majority of HADSF enzymes using more than five substrates. Substrate profiling allowed assignment of function to previously unannotated enzymes with known structure, uncovered potential new pathways, and identified isofunctional orthologs from evolutionarily distant taxonomic groups. Intriguingly, the HADSF subfamily having the least structural elaboration of the Rossmann fold catalytic domain was the most specific, consistent with the concept that domain insertions drive the evolution of new functions and that the broad specificity observed in HADSF may be a relic of this process.evolution | specificity | phosphatase | substrate screen | promiscuity S ince the first genomes were sequenced, there has been an exponential increase in the number of protein sequences deposited into databases worldwide. At the time of this writing the UniProtKB/TrEMBL database contains over 32 million protein sequences. Although this increase in sequence data has dramatically enhanced our understanding of the genomic organization of organisms, as the number of protein sequences grows, the proportion of firm functional assignments diminishes. Traditionally, methods of functional annotation involve comparing sequence identity between experimentally characterized proteins and newly sequenced ones, typically via BLAST (1). In cases where significant sequence similarity cannot be ascertained, proteins are annotated as "hypothetical" or "putative." Moreover, the decrease in sequence identity leads to an increased uncertainty in functional assignment, especially as the phylogenetic distance between organisms grows, limiting iso-functional ortholog discovery.As the number of newly sequenced genomes grows larger, more protein sequences are likely to be misannotated, oftentimes resulting in the propagation of incorrect functional annotation across newly identified sequences. To tackle the problem of unannotated or misannotated proteins, newer methods for computational assignment have been created with varying degrees of success (2). Although these methods outperform historical methods, continued improvement is necessary to ensure accurate annotation of function (2). A greater swath of functional space can be covered by screening substrates in a high-throughput manner on multiple enzymes from a family (3, 4). Family-wide substrate profiling offers a data-rich resource. The use of sparse screening of sequence space and a diversified library permits the determination of substrate specificity profiles to provide a familywide view of the range of substrates...
“…Orthogonal to the genome-wide study discussed above, Bastard et al (30) demonstrated the use of an integrated strategy for exploring the functional diversity of a previously undescribed enzyme family, DUF849 (see Fig. 1 for general reaction and representative substrates).…”
Section: Screening To Assess Substrate Ambiguitymentioning
Catalytic promiscuity and substrate ambiguity are keys to evolvability, which in turn is pivotal to the successful acquisition of novel biological functions. Action on multiple substrates (substrate ambiguity) can be harnessed for performance of functions in the cell that supersede catalysis of a single metabolite. These functions include proofreading, scavenging of nutrients, removal of antimetabolites, balancing of metabolite pools, and establishing system redundancy. In this review, we present examples of enzymes that perform these cellular roles by leveraging substrate ambiguity and then present the structural features that support both specificity and ambiguity. We focus on the phosphatases of the haloalkanoate dehalogenase superfamily and the thioesterases of the hotdog fold superfamily.In the 1990s, a series of studies on the evolution of catalysis in protein fold families helped define contemporary understanding of enzymes as potentially promiscuous catalysts; the analyses of these enzyme superfamilies suggested that certain folds showed higher variability than expected with regard to the chemistries that can be catalyzed or the substrates that can be acted on (1-11). To summarize, the current model holds that enzyme families grow as a result of gene duplication coupled with the acquisition of an advantageous new function. Because the backbone folds, and thus, the catalytic scaffolds are inherited, so is the chemical trait that underlies the intrinsic catalytic functions of all family members. In enzyme families, evidence can be found for low level intrinsic activity associated with one or more extant members, co-existing with the high level of activity unique to the subject enzyme (see for instance, the enolase and alkaline phosphatase enzyme superfamilies (12, 13)). The ability to carry out such alternate chemistry is termed catalytic promiscuity. The plausible link between catalytic promiscuity and evolvability has been explored in previous publications (for recent coverage and reviews of this topic, see Refs. 14 -17).The most commonly encountered observation of promiscuity involves the catalysis of one type of chemistry with many different substrates. Jensen (18) referred to this trait as "substrate ambiguity," and this is the name we will use. Herein, we examine the selective advantage associated with activity toward multiple substrates by highlighting specific examples of enzymes for which the level of substrate ambiguity runs high to fulfill specific roles in the cell. We use as examples enzymes from the haloalkanoate dehalogenase (HAD) 3 superfamily and the thioesterases of the hotdog fold superfamily. In addition, we dissect the architectures of enzymes from these families to discover underlying structural sources of specificity and substrate ambiguity.
Screening to Assess Substrate AmbiguityIn vitro enzyme activity measurements carried out with a structurally diverse library of potential substrates allow one to generate a substrate specificity profile for the enzyme of interest. However, the mo...
“…Many of the widespread DUFs in bacteria are biologically essential, and the importance of prioritizing these DUFs has been recognized in recent years (10). Before this study, only one DUF family, the DUF849 family of 922 proteins, had been evaluated in large scale by a single report (11), which heavily relied on previously published enzymatic activity and catalytic mechanism/ liganded structure of one of its members (35,36). Recognizing the difficulties associated with DUF functional assignment, we attempt to apply an integrated "genomic enzymology" strategy that had only been used for functional characterization of members of families with known functions (37,38).…”
Section: Synergistic Analysis Of Ssns and Gnns Enables The Predictionmentioning
confidence: 99%
“…However, the assignment of functions to uncharacterized proteins is challenging (11)(12)(13). The methods now available for discovering the functions of uncharacterized proteins, including DUFs, are inefficient and often depend on inference from the functions of characterized homologs (11). Therefore, new strategies are required to confront and solve the functional assignment challenge.…”
mentioning
confidence: 99%
“…Thus, reliable experimental identification of the in vitro enzymatic activities and in vivo physiological functions for the DUF families is important. However, the assignment of functions to uncharacterized proteins is challenging (11)(12)(13). The methods now available for discovering the functions of uncharacterized proteins, including DUFs, are inefficient and often depend on inference from the functions of characterized homologs (11).…”
Using a large-scale "genomic enzymology" approach, we (i) assigned novel ATP-dependent four-carbon acid sugar kinase functions to members of the DUF1537 protein family (domain of unknown function; Pfam families PF07005 and PF17042) and (ii) discovered novel catabolic pathways for D-threonate, L-threonate, and D-erythronate. The experimentally determined ligand specificities of several solute binding proteins (SBPs) for TRAP (tripartite ATP-independent permease) transporters for four-carbon acids, including D-erythronate and L-erythronate, were used to constrain the substrates for the catabolic pathways that degrade the SBP ligands to intermediates in central carbon metabolism. Sequence similarity networks and genome neighborhood networks were used to identify the enzyme components of the pathways. Conserved genome neighborhoods encoded SBPs as well as permease components of the TRAP transporters, members of the DUF1537 family, and a member of the 4-hydroxy-L-threonine 4-phosphate dehydrogenase (PdxA) oxidative decarboxylase, class II aldolase, or ribulose 1,5-bisphosphate carboxylase/oxygenase, large subunit (RuBisCO) superfamily. Because the characterized substrates of members of the PdxA, class II aldolase, and RuBisCO superfamilies are phosphorylated, we postulated that the members of the DUF1537 family are novel ATP-dependent kinases that participate in catabolic pathways for four-carbon acid sugars. We determined that (i) the DUF1537/PdxA pair participates in a pathway for the conversion of D-threonate to dihydroxyacetone phosphate and CO 2 and (ii) the DUF1537/class II aldolase pair participates in pathways for the conversion of D-erythronate and L-threonate (epimers at carbon-3) to dihydroxyacetone phosphate and CO 2 . The physiological importance of these pathways was demonstrated in vivo by phenotypic and genetic analyses.DUF1537 | kinase | four-carbon acid sugars | conserved genome neighborhoods | genomic enzymology
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.