2018
DOI: 10.1371/journal.pgen.1007333
|View full text |Cite
|
Sign up to set email alerts
|

Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica

Abstract: Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest c… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
98
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 78 publications
(101 citation statements)
references
References 84 publications
3
98
0
Order By: Relevance
“…For example, Wheeler et al . used random forests to predict invasiveness of Salmonella enterica lineages [14]. In another study, a tree ensemble was trained with boosting to predict the minimum inhibitory concentration from DNA k-mers for a large-scale Klebsiella pneumoniae panel [10], but the value of using core genome compared to accessory genes was not investigated.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, Wheeler et al . used random forests to predict invasiveness of Salmonella enterica lineages [14]. In another study, a tree ensemble was trained with boosting to predict the minimum inhibitory concentration from DNA k-mers for a large-scale Klebsiella pneumoniae panel [10], but the value of using core genome compared to accessory genes was not investigated.…”
Section: Discussionmentioning
confidence: 99%
“…As general-purpose methods, they are agnostic to the causal mechanisms, and learn useful features directly from data [7-9]. Already, decision tree based models have proven valuable for predicting resistance and pathogen invasiveness from genomic sequences [10-14]. However, these studies were limited in both the genetic features used and the methods applied.…”
Section: Introductionmentioning
confidence: 99%
“…Notably, the relative contribution of the different information sources to the susceptibility and resistance sensitivity strongly depended on the antibiotic. To assess the effect of the classification technique, we compared the performance of an SVM classifier with a linear kernel to that of random forests, and logistic regression, which we and others have used successfully for related phenotype prediction problems (Her and Wu 2018;Asgari et al 2018;Wheeler, Gardner, and Barquist 2018) . For this purpose we used the data type combination with the best macro F1-score in resistance prediction with the SVM.…”
Section: Figurementioning
confidence: 99%
“…Researchers have also begun to apply machine learning techniques to predict bacterial pathogenicity. Examples include using discriminatory single nucleotide variants (SNVs) to predict Staphylococcus aureus in vitro cytotoxicity 27 , using variation in core genome loci to predict patient mortality in specific S. aureus clones 28 , and using predicted perturbations in protein coding sequences to classify Salmonella strains as causing either gastrointestinal or extraintestinal infections 29 . A support vector machine approach has been used to distinguish the transcriptomes of P. aeruginosa in human infection compared to in vitro growth 30 .…”
Section: Introductionmentioning
confidence: 99%