2021
DOI: 10.3389/fgene.2021.642282
|View full text |Cite
|
Sign up to set email alerts
|

Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier

Abstract: Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 38 publications
0
6
0
Order By: Relevance
“…Furthermore, certain variations are intended to reducing the computation time. In particular, instead of using reads directly, researchers used k‐mers, constructed by splitting the reads into smaller sequences such as 24 bp, as classifiers (Anyaso‐Samuel et al, 2021; Huang et al, 2020). Some pipelines reduce the data by excluding features from the abundance tables; for example, some genera and features unable to distinguish samples are excluded (Casimiro‐Soriguer et al, 2019; Walker et al, 2018), or dimension reduction methods, such as principal component analysis (PCA), are used.…”
Section: Subway Citizenmentioning
confidence: 99%
“…Furthermore, certain variations are intended to reducing the computation time. In particular, instead of using reads directly, researchers used k‐mers, constructed by splitting the reads into smaller sequences such as 24 bp, as classifiers (Anyaso‐Samuel et al, 2021; Huang et al, 2020). Some pipelines reduce the data by excluding features from the abundance tables; for example, some genera and features unable to distinguish samples are excluded (Casimiro‐Soriguer et al, 2019; Walker et al, 2018), or dimension reduction methods, such as principal component analysis (PCA), are used.…”
Section: Subway Citizenmentioning
confidence: 99%
“…Avoiding the pooling of data from different studies can bypass the study-specific effect issue, though greatly reduces the statistical power with negative effects on the reliability of the outcome. Additionally, microbiome data commonly suffer from imbalanced sample distribution (Khan and Kelly, 2020 ; Poore et al, 2020 ; Anyaso-Samuel et al, 2021 ). Particularly in (binary) classification applications, it is commonly the case that one class is overrepresented (majority class) while the other is underrepresented (minority class).…”
Section: Introductionmentioning
confidence: 99%
“…This challenge exhorts a comprehensive examination of anti‐microbial resistance (AMR) patterns in this vast metagenomic surveillance data, attracting dedicated researchers striving to uncover these complex interactions. Unlike previous studies that have predominantly focused on geolocation prediction 4–6 or spatial modelling 7,8 of such patterns, our research forges a new path by delving into the uncharted territory of bacteriophages' role in orchestrating AMR dissemination. Bacteriophages, also known as phages, are viruses that prey on bacteria 9 .…”
Section: Introductionmentioning
confidence: 99%