The millions of protein sequences generated by genomics are expected to transform protein engineering and personalized medicine. To achieve these goals, tools for predicting outcomes of amino acid changes must be improved. Currently, advances are hampered by insufficient experimental data about nonconserved amino acid positions. Since the property “nonconserved” is identified using a sequence alignment, we designed experiments to recapitulate that context: Mutagenesis and functional characterization was carried out in 15 LacI/GalR homologs (rows) at 12 nonconserved positions (columns). Multiple substitutions were made at each position, to reveal how various amino acids of a nonconserved column were tolerated in each protein row. Results showed that amino acid preferences of nonconserved positions were highly context-dependent, had few correlations with physico-chemical similarities, and were not predictable from their occurrence in natural LacI/GalR sequences. Further, unlike the “toggle switch” behaviors of conserved positions, substitutions at nonconserved positions could be rank-ordered to show a “rheostatic”, progressive effect on function that spanned several orders of magnitude. Comparisons to various sequence analyses suggested that conserved and strongly co-evolving positions act as functional toggles, whereas other important, nonconserved positions serve as rheostats for modifying protein function. Both the presence of rheostat positions and the sequence analysis strategy appear to be generalizable to other protein families and should be considered when engineering protein modifications or predicting the impact of protein polymorphisms.
LacI/GalR transcription regulators have extensive, non-conserved interfaces between their regulatory domains and the 18 amino acids that serve as ‘linkers’ to their DNA-binding domains. These non-conserved interfaces might contribute to functional differences between paralogs. Previously, two chimeras created by domain recombination displayed novel functional properties. Here, we present a synthetic protein family, which was created by joining the LacI DNA-binding domain/linker to seven additional regulatory domains. Despite ‘mismatched’ interfaces, chimeras maintained allosteric response to their cognate effectors. Therefore, allostery in many LacI/GalR proteins does not require interfaces with precisely matched interactions. Nevertheless, the chimeric interfaces were not silent to mutagenesis, and preliminary comparisons suggest that the chimeras provide an ideal context for systematically exploring functional contributions of non-conserved positions. DNA looping experiments revealed higher order (dimer–dimer) oligomerization in several chimeras, which might be possible for the natural paralogs. Finally, the biological significance of repression differences was determined by measuring bacterial growth rates on lactose minimal media. Unexpectedly, moderate and strong repressors showed an apparent induction phase, even though inducers were not provided; therefore, an unknown mechanism might contribute to regulation of the lac operon. Nevertheless, altered growth correlated with altered repression, which indicates that observed functional modifications are significant.
Human mutations often cause amino acid changes (variants) that can alter protein function or stability. Some variants fall at protein positions that experimentally exhibit "rheostatic" mutation outcomes (different amino acid substitutions lead to a range of functional outcomes). In ongoing studies of rheostat positions, we encountered the need to aggregate experimental results from multiple variants, to describe the overall roles of individual positions. Here, we present "RheoScale" which generates quantitative scores to discriminate rheostat positions from those with "toggle" (most substitutions abolish function) or "neutral" (most substitutions have wild-type function) outcomes. RheoScale scores facilitate correlations of experimental data (such as binding affinity or stability) with structural and bioinformatic analyses. The RheoScale calculator is encoded into a Microsoft Excel workbook and an R script. Example analyses are shown for three model protein systems, including one assessed via deep mutational scanning. The RheoScale calculator quickly and efficiently provided quantitative descriptions that were in good agreement with prior qualitative observations. As an example application, scores were compared to the example proteins' structures; strong rheostat positions tended to occur in dynamic locations. In the future, RheoScale scores can be easily integrated into computational studies to facilitate improved algorithms for predicting outcomes of human variants.
SummaryThe lactose repressor protein (LacI) was among the very first genetic regulatory proteins discovered, and more than 1000 members of the bacterial LacI/GalR family are now identified. LacI has been the prototype for understanding how transcription is controlled using small metabolites to modulate protein association with specific DNA sites. This understanding has been greatly expanded by the study of other LacI/GalR homologues. A general picture emerges in which the conserved fold provides a scaffold for multiple types of interactions -including oligomerization, small molecule binding, and protein•protein binding -that in turn influence target DNA binding and thereby regulate mRNA production. Although many different functions have evolved from this basic scaffold, each homologue retains functional flexibility: For the same protein, different small molecules can have disparate impact on DNA binding and hence transcriptional outcome. In turn, binding to alternative DNA sequences may impact the degree of allosteric response. Thus, this family exhibits a symphony of variations by which transcriptional control is achieved. Overview of the LacI/GalR familyIn virtually all bacteria, LacI/GalR family members regulate transcription for a wide range of processes. First catalogued in 1992 by Weickert and Adhya [1], sequences of >1000 characterized and hypothetical homologues are now known (2008 BLAST search of SwissProt). These proteins have not been found in archaebacteria or eukaryotes, although proteins with homologous domains are ubiquitous.The LacI/GalR family can be divided into >33 paralogue groups that appear to derive from an ancestral gene. As many as 22 paralogues co-exist in a single species. Many members coordinate available nutrients with expression of catabolic genes [1], but some regulate processes as diverse as nucleotide biosynthesis and toxin expression (e.g. [2,3]). Two members are "master" regulators: homologues CcpA and CRA control expression of enzymes that determine carbon flow in Gram-negative and Gram-positive bacteria, respectively. If these key proteins are disabled, virulence is altered in several pathogens (e.g. [4,5•,6•,7]).The common function of the LacI/GalR proteins, which features allosteric regulation of DNA binding to modulate transcription, is shown in Figure 1. Each homologue has evolved a unique variation: In addition to binding specific "operator" DNA sequences, each protein exhibits specificity for distinct effector ligands. Although most members repress transcription, someCorresponding Author: Swint-Kruse, Liskin (E-mail: lswint-kruse@kumc.edu). Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, a...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.