The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.statistical sequence analysis | residue-residue covariation | contact map prediction | maximum-entropy modeling
In bacteria, the rate of cell proliferation and the level of gene expression are intimately intertwined. Elucidating these relations is important both for understanding the physiological functions of endogenous genetic circuits and for designing robust synthetic systems. We describe a phenomenological study that reveals intrinsic constraints governing the allocation of resources toward protein synthesis and other aspects of cell growth. A theory incorporating these constraints can accurately predict how cell proliferation and gene expression affect one another, quantitatively accounting for the effect of translation-inhibiting antibiotics on gene expression and the effect of gratuitous protein expression on cell growth. The use of such empirical relations, analogous to phenomenological laws, may facilitate our understanding and manipulation of complex biological systems before underlying regulatory circuits are elucidated.
Understanding the molecular determinants of specificity in proteinprotein interaction is an outstanding challenge of postgenome biology. The availability of large protein databases generated from sequences of hundreds of bacterial genomes enables various statistical approaches to this problem. In this context covariance-based methods have been used to identify correlation between amino acid positions in interacting proteins. However, these methods have an important shortcoming, in that they cannot distinguish between directly and indirectly correlated residues. We developed a method that combines covariance analysis with global inference analysis, adopted from use in statistical physics. Applied to a set of >2,500 representatives of the bacterial two-component signal transduction system, the combination of covariance with global inference successfully and robustly identified residue pairs that are proximal in space without resorting to ad hoc tuning parameters, both for heterointeractions between sensor kinase (SK) and response regulator (RR) proteins and for homointeractions between RR proteins. The spectacular success of this approach illustrates the effectiveness of the global inference approach in identifying direct interaction based on sequence information alone. We expect this method to be applicable soon to interaction surfaces between proteins present in only 1 copy per genome as the number of sequenced genomes continues to expand. Use of this method could significantly increase the potential targets for therapeutic intervention, shed light on the mechanism of protein-protein interaction, and establish the foundation for the accurate prediction of interacting protein partners.T he large majority of cellular functions are executed and controlled by interacting proteins. With up to several thousand types of proteins expressed in a typical bacterial cell at a given time, their concerted specific interactions regulate the interplay of biochemical processes that are the essence of life. Many protein interactions are transient, allowing proteins to mate with several partners or travel in cellular space to perform their functions. Understanding these transient interactions is one of the outstanding challenges of systems biology (reviewed in ref. 1). The characterization of the molecular details of the interface formed between known interacting proteins is a requirement for understanding the molecular determinants of protein-protein interaction, the knowledge of which may be important for a variety of applications including synthetic biology, e.g., designing new specific interaction between proteins (reviewed in ref.2), and pharmaceutics, e.g., protein interaction surfaces as drug targets (reviewed in ref.3).Experimental approaches to identify surfaces of interaction between proteins such as surface-scanning mutagenesis and cocrystal structure generation are arduous and/or serendipitous. Cocrystal structures provide the best molecular resolution but are particularly challenging to obtain for transient interactio...
The expression of genes is regularly characterized with respect to how much, how fast, when and where. Such quantitative data demands quantitative models. Thermodynamic models are based on the assumption that the level of gene expression is proportional to the equilibrium probability that RNA polymerase (RNAP) is bound to the promoter of interest. Statistical mechanics provides a framework for computing these probabilities. Within this framework, interactions of activators, repressors, helper molecules and RNAP are described by a single function, the 'regulation factor'. This analysis culminates in an expression for the probability of RNA polymerase binding at the promoter of interest as a function of the number of regulatory proteins in the cell.
Summary Bacterial gene expression depends not only on specific regulations but also directly on bacterial growth, because important global parameters such as the abundance of RNA polymerases and ribosomes are all growth-rate dependent. Understanding these global effects is necessary for a quantitative understanding of gene regulation and for the robust design of synthetic genetic circuits. The observed growth-rate dependence of constitutive gene expression can be explained by a simple model using the measured growth-rate dependence of the relevant cellular parameters. More complex growth dependences for genetic circuits involving activators, repressors and feedback control were analyzed, and salient features were verified experimentally using synthetic circuits. The results suggest a novel feedback mechanism mediated by general growth-dependent effects and not requiring explicit gene regulation, if the expressed protein affects cell growth. This mechanism can lead to growth bistability and promote the acquisition of important physiological functions such as antibiotic resistance and tolerance (persistence).
Overflow metabolism refers to the seemingly wasteful strategy in which cells use fermentation instead of the more efficient respiration to generate energy, despite the availability of oxygen. Known as Warburg effect in the context of cancer growth, this phenomenon occurs ubiquitously for fast growing cells, including bacteria, fungi, and mammalian cells, but its origin has remained mysterious despite decades of research. Here we study metabolic overflow in E. coli and show that it is a global physiological response used to cope with changing proteomic demands of energy biogenesis and biomass synthesis under different growth conditions. A simple model of proteomic resource allocation can quantitatively account for all of the observed behaviors and accurately predict responses to novel perturbations. The key hypothesis of the model, that the proteome cost of energy biogenesis by respiration exceeds that by fermentation, is quantitatively confirmed by direct measurement of protein abundances via quantitative mass spectrometry.
Cells receive a wide variety of cellular and environmental signals, which are often processed combinatorially to generate specific genetic responses. Here we explore theoretically the potentials and limitations of combinatorial signal integration at the level of cisregulatory transcription control. Our analysis suggests that many complex transcription-control functions of the type encountered in higher eukaryotes are already implementable within the much simpler bacterial transcription system. Using a quantitative model of bacterial transcription and invoking only specific protein-DNA interaction and weak glue-like interaction between regulatory proteins, we show explicit schemes to implement regulatory logic functions of increasing complexity by appropriately selecting the strengths and arranging the relative positions of the relevant protein-binding DNA sequences in the cis-regulatory region. The architectures that emerge are naturally modular and evolvable. Our results suggest that the transcription regulatory apparatus is a ''programmable'' computing machine, belonging formally to the class of Boltzmann machines. Crucial to our results is the ability to regulate gene expression at a distance. In bacteria, this can be achieved for isolated genes via DNA looping controlled by the dimerization of DNA-bound proteins. However, if adopted extensively in the genome, long-distance interaction can cause unintentional intergenic cross talk, a detrimental side effect difficult to overcome by the known bacterial transcription-regulation systems. This may be a key factor limiting the genome-wide adoption of complex transcription control in bacteria. Implications of our findings for combinatorial transcription control in eukaryotes are discussed. Biological organisms ranging from bacteria to humans possess an enormous repertoire of genetic responses to ever-changing combinations of cellular and environmental signals. To a large extent, this repertoire is encoded in complex networks of genes closely regulating the activities of each other. Characterizing and decoding the connectivity of gene regulatory networks has been an outstanding challenge of post-genome molecular biology (1-4). However, unlike integrated circuits, which process information through synchronized cascades of many simple and fast nodes and for which connectivity is the primary source of network complexity, a gene-regulatory network typically consists of only a few tens to hundreds of genes, the expression of which is slow and asynchronous (5). Yet these ''nodes'' are very sophisticated in their capacity to integrate signals: In eukaryotes, each node can be regulated combinatorially, often by four to five other nodes (1, 6), and the regulatory control function can be extremely complex (7). Here we focus primarily on one node of a gene-regulatory network and investigate quantitatively the power and limitations of combinatorial gene regulation in the context of bacterial transcription. We find that the bacterial transcription system is already capable of im...
Cyclic AMP (cAMP) dependent catabolite repression effect in E. coli is among the most intensely studied regulatory processes in biology. However, the physiological function(s) of cAMP signalling and its molecular triggers remain elusive. Here we use a quantitative physiological approach to show that cAMP signalling tightly coordinates the cell’s protein expression program with its metabolic needs during exponential cell growth: The expression of carbon catabolic genes increased linearly with decreasing growth rates upon limitation of carbon influx, but decreased linearly with decreasing growth rate upon limitation of nitrogen or sulfur influx. In contrast, the expression of biosynthetic genes exhibited the opposite linear growth-rate dependence as the catabolic genes. A coarse-grained mathematical model provides a quantitative framework for understanding and predicting gene expression responses to catabolic and anabolic limitations. A scheme of integral feedback control featuring the inhibition of cAMP signalling by metabolic precursors is proposed and validated. These results reveal a key physiological role of cAMP-dependent catabolite repression: to ensure that proteomic resources are spent on distinct metabolic sectors as needed in different nutrient environments. Our finding underscores the power of quantitative physiology in unravelling the underlying functions of complex molecular signalling networks.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers