2021
DOI: 10.1038/s41598-021-91941-6
|View full text |Cite
|
Sign up to set email alerts
|

Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models

Abstract: Group A Streptococcus (GAS) is a globally significant bacterial pathogen. The GAS genotyping gold standard characterises the nucleotide variation of emm, which encodes a surface-exposed protein that is recombinogenic and under immune-based selection pressure. Within a supervised learning methodology, we tested three random forest (RF) algorithms (Guided, Ordinary, and Regularized) and 53 GAS response regulator (RR) allele types to infer six genomic traits (emm-type, emm-subtype, tissue and country of sample, c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 43 publications
0
6
0
Order By: Relevance
“…We observed a strain-dependent variability in the IGRs and coding sequences of GAS TCSs and TRs using phylogenetics and concordance, and proposed a set of core TRs as candidates for a novel GAS typing system. These subsequently informed the design of our ML workflow, in which we were able to predict the GAS strain (emm type) with 97% accuracy and establish that mga2 and lrp were the most mathematically powerful predictors of strain (Buckley et al, 2021). Overall this finding was important because it revealed a backward-compatibility between our TRbased typing system and the vast emm-based knowledge set.…”
Section: Lesson 2: Bacterial Phylogenetic Delineation Needs a 'Wgs' Redomentioning
confidence: 78%
See 4 more Smart Citations
“…We observed a strain-dependent variability in the IGRs and coding sequences of GAS TCSs and TRs using phylogenetics and concordance, and proposed a set of core TRs as candidates for a novel GAS typing system. These subsequently informed the design of our ML workflow, in which we were able to predict the GAS strain (emm type) with 97% accuracy and establish that mga2 and lrp were the most mathematically powerful predictors of strain (Buckley et al, 2021). Overall this finding was important because it revealed a backward-compatibility between our TRbased typing system and the vast emm-based knowledge set.…”
Section: Lesson 2: Bacterial Phylogenetic Delineation Needs a 'Wgs' Redomentioning
confidence: 78%
“…Collectively, these findings were significant because it has been suggested (Lees et al, 2019) that an ability to detect rare genotype anomalies enhances the discovery of rare clinically-relevant phenotypes. We were also able to predict the country of origin using this approach, suggesting a geography-dependent evolution of GAS TRs (Buckley et al, 2021).…”
Section: Lesson 2: Bacterial Phylogenetic Delineation Needs a 'Wgs' Redomentioning
confidence: 82%
See 3 more Smart Citations