Bacterial "stand-alone" response regulators (RRs) are pivotal to the control of gene transcription in response to changing cytosolic and extracellular microenvironments during infection. The genome of group A Streptococcus (GAS) encodes more than 30 stand-alone RRs that orchestrate the expression of virulence factors involved in infecting multiple tissues, so causing an array of potentially lethal human diseases. Here, we analysed the molecular epidemiology and biological associations in the coding sequences (CDSs) and upstream intergenic regions (IGRs) of 35 stand-alone RRs from a collection of global GAS genomes. Of the 944 genomes analysed, 97% encoded 32 or more of the 35 tested RRs. The length of RR CDSs ranged from 297 to 1587 nucleotides with an average nucleotide diversity (π) of 0.012, while the IGRs ranged from 51 to 666 nucleotides with average π of 0.017. We present new evidence of recombination in multiple RRs including mga, leading to mga-2 switching, emm-switching and emm-like gene chimerization, and the first instance of an isolate that encodes both mga-1 and mga-2. Recombination was also evident in rofA/nra and msmR loci with 15 emm-types represented in multiple FCT (fibronectin-binding, collagen-binding, T-antigen)-types, including novel emm-type/FCT-type pairings. Strong associations were observed between concatenated RR allele types, and emm-type, MLST-type, core genome phylogroup, and country of sampling. No strong associations were observed between individual loci and disease outcome. We propose that 11 RRs may form part of future refinement of GAS typing systems that reflect core genome evolutionary associations. This subgenomic analysis revealed allelic traits that were informative to the biological function, GAS strain definition, and regional outbreak detection.
Bacteria respond to environmental changes through the co-ordinated regulation of gene expression, often mediated by two-component regulatory systems (TCS). Group A Streptococcus (GAS), a bacterium which infects multiple human body sites and causes multiple diseases, possesses up to 14 TCS. In this study we examined genetic variation in the coding sequences and non-coding DNA upstream of these TCS as a method for evaluating relationships between different GAS emm-types, and potential associations with GAS disease. Twelve of the 14 TCS were present in 90% of the genomes examined. The length of the intergenic regions (IGRs) upstream of TCS coding regions varied from 39 to 345 nucleotides, with an average nucleotide diversity of 0.0064. Overall, IGR allelic variation was generally conserved with an emm-type. Subsequent phylogenetic analysis of concatenated sequences based on all TCS IGR sequences grouped genomes of the same emm-type together. However grouping with emm-pattern and emm-cluster-types was much weaker, suggesting epidemiological and functional properties associated with the latter are not due to evolutionary relatedness of emm-types. All emm5, emm6 and most of the emm18 genomes, all historically considered rheumatogenic emm-types clustered together, suggesting a shared evolutionary history. However emm1, emm3 and several emm18 genomes did not cluster within this group. These latter emm18 isolates were epidemiologically distinct from other emm18 genomes in study, providing evidence for local variation. emm-types associated with invasive disease or nephritogenicity also did not cluster together. Considering the TCS coding sequences (cds), correlation with emm-type was weaker than for the IGRs, and no strong correlation with disease was observed. Deletion of the malate transporter, maeP, was identified that serves as a putative marker for the emm89.0 subtype, which has been implicated in invasive outbreaks. A recombination-related, subclade-forming DNA motif was identified in the putative receiver domain of the Spy1556 response regulator that correlated with throat-associated emm-pattern-type A-C strains.
Group A Streptococcus (GAS) is a globally significant bacterial pathogen. The GAS genotyping gold standard characterises the nucleotide variation of emm, which encodes a surface-exposed protein that is recombinogenic and under immune-based selection pressure. Within a supervised learning methodology, we tested three random forest (RF) algorithms (Guided, Ordinary, and Regularized) and 53 GAS response regulator (RR) allele types to infer six genomic traits (emm-type, emm-subtype, tissue and country of sample, clinical outcomes, and isolate invasiveness). The Guided, Ordinary, and Regularized RF classifiers inferred the emm-type with accuracies of 96.7%, 95.7%, and 95.2%, using ten, three, and four RR alleles in the feature set, respectively. Notably, we inferred the emm-type with 93.7% accuracy using only mga2 and lrp. We demonstrated a utility for inferring emm-subtype (89.9%), country (88.6%), invasiveness (84.7%), but not clinical (56.9%), or tissue (56.4%), which is consistent with the complexity of GAS pathophysiology. We identified a novel cell wall-spanning domain (SF5), and proposed evolutionary pathways depicting the ‘contrariwise’ and ‘likewise’ chimeric deletion-fusion of emm and enn. We identified an intermediate strain, which provides evidence of the time-dependent excision of mga regulon genes. Overall, our workflow advances the understanding of the GAS mga regulon and its plasticity.
Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catalogued the distribution and diversity of the transcription regulators of GAS, and employed phylogenetics, concordance metrics and machine learning (ML) to test for associations. In this review, we communicate the lessons learnt in the context of the recent bacteria genotype-phenotype association studies of others that have utilised both genome-wide association studies (GWAS) and ML. We envisage a promising future for the application GWAS in bacteria genotype-phenotype association studies and foresee the increasing use of ML. However, progress in this field is hindered by several outstanding bottlenecks. These include the shortcomings that are observed when GWAS techniques that have been fine-tuned on human genomes, are applied to bacterial genomes. Furthermore, there is a deficit of easy-to-use end-to-end workflows, and a lag in the collection of detailed phenotype and clinical genomic metadata. We propose a novel quality control protocol for the collection of high-quality GAS virulence phenotype coupled to clinical outcome data. Finally, we incorporate this protocol into a workflow for testing genotype-phenotype associations using ML and ‘linked’ patient-microbe genome sets that better represent the infection event.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.