Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Two-component systems (TCS) are signaling machinery that consist of a histidine kinases (HK) and response regulator (RR). When an environmental change is detected, the HK phosphorylates its cognate response regulator (RR). While cognate interactions were considered orthogonal, experimental evidence shows the prevalence of crosstalk interactions between non-cognate HK–RR pairs. Currently, crosstalk interactions have been demonstrated for TCS proteins in a limited number of organisms. By providing specificity predictions across entire TCS networks for a large variety of organisms, the ELIHKSIR web server assists users in identifying interactions for TCS proteins and their mutants. To generate specificity scores, a global probabilistic model was used to identify interfacial couplings and local fields from sequence information. These couplings and local fields were then used to construct Hamiltonian scores for positions with encoded specificity, resulting in the specificity score. These methods were applied to 6676 organisms available on the ELIHKSIR web server. Due to the ability to mutate proteins and display the resulting network changes, there are nearly endless combinations of TCS networks to analyze using ELIHKSIR. The functionality of ELIHKSIR allows users to perform a variety of TCS network analyses and visualizations to support TCS research efforts.
Enterococcus faecalis is an opportunistic pathogen that can cause bacteremia and endocarditis. Previous studies have shown that concurrent treatment with cephalosporin and vancomycin antibiotics exhibit synergy in vancomycin-resistant E. faecalis to render the bacterium susceptible to antibiotic treatment whereas treatment with each antibiotic separately was not successful. Proteins responsible for mediating vancomycin and cephalosporin resistance are classified as two-component systems (TCS). TCS consist of a histidine kinase that phosphorylates a response regulator after environmental activation. These signaling networks have been shown to exhibit cross-talk interactions, and through direct coupling analysis, we identify encoded specificity between vancomycin resistance TCS, which are horizontally acquired, and cephalosporin resistance TCS, which are endogenous to E. faecalis . To verify cross-talk between these pathways is responsible for vancomycin and cephalosporin synergy, we use RNA-Seq to identify differentially expressed genes in VanA- and VanB-type vancomycin resistant enterococci after treatment with the cephalosporin antibiotic, ceftriaxone, and also with vancomycin. We find that cross-talk between VanS A and CroR in strain HIP11704 may be responsible for synergy, demonstrating that horizontally acquired TCS can have large impacts on pre-existing signaling networks. The presence of encoded specificity between exogenous TCS and endogenous TCS show that the systems co-evolve, and cross-talk between these systems may be exploited to engineer genetic elements that disrupt antibiotic resistance TCS pathways.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.