The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.statistical sequence analysis | residue-residue covariation | contact map prediction | maximum-entropy modeling
Predictive understanding of the myriads of signal transduction pathways in a cell is an outstanding challenge of systems biology. Such pathways are primarily mediated by specific but transient protein-protein interactions, which are difficult to study experimentally. In this study, we dissect the specificity of protein-protein interactions governing two-component signaling (TCS) systems ubiquitously used in bacteria. Exploiting the large number of sequenced bacterial genomes and an operon structure which packages many pairs of interacting TCS proteins together, we developed a computational approach to extract a molecular interaction code capturing the preferences of a small but critical number of directly interacting residue pairs. This code is found to reflect physical interaction mechanisms, with the strongest signal coming from charged amino acids. It is used to predict the specificity of TCS interaction: Our results compare favorably to most available experimental results, including the prediction of 7 (out of 8 known) interaction partners of orphan signaling proteins in Caulobacter crescentus. Surveying among the available bacterial genomes, our results suggest 15∼25% of the TCS proteins could participate in out-of-operon “crosstalks”. Additionally, we predict clusters of crosstalking candidates, expanding from the anecdotally known examples in model organisms. The tools and results presented here can be used to guide experimental studies towards a system-level understanding of two-component signaling.
Based on alleged functional residue correspondences between FucP and LacY, a recent study has resulted in a proposed model of 3-TMS unit rearrangements [Madej et al.: Proc Natl Acad Sci USA 2013;110:5870-5874]. We rebut this theory, using 7 different lines of evidence. Our observations suggest that these two transporters are homologous throughout their lengths, having evolved from a common ancestor without repeat unit rearrangements. We exploit the availability of the high-resolution XylE crystal structures in multiple conformations including the inward-facing state to render possible direct comparisons with LacY. Based on a Δdistance map, we confirm the conclusion of Quistgaard et al. [Nat Struct Mol Biol 2013;20:766-768] that the N-terminal 6 TMS halves of these transporters are internally less mobile than the second halves during the conformational transition from the outward occluded state to the inward occluded state and inward occluded state to inward open state. These observations, together with those of Madej et al. [2013], lead to the suggestion that functionally equivalent catalytic residues involved in substrate binding and transport catalysis have evolved in dissimilar positions, but apparently often in similar positions in the putative 3-TMS repeat units, from a single structural scaffold without intragenic rearrangement.
The objective of this shared task is to produce an inflected form of a word, given its lemma and a set of tags describing the attributes of the desired form. In this paper, we describe a transformer-based model that uses a bidirectional decoder to perform this task, and evaluate its performance on the 90 languages and 18 language families used in this task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.