Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier

Anyaso-Samuel, Samuel; Sachdeva, Archie; Guha, Subharup; Datta, Somnath

doi:10.3389/fgene.2021.642282

Cited by 8 publications

(7 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, certain variations are intended to reducing the computation time. In particular, instead of using reads directly, researchers used k‐mers, constructed by splitting the reads into smaller sequences such as 24 bp, as classifiers (Anyaso‐Samuel et al, 2021; Huang et al, 2020). Some pipelines reduce the data by excluding features from the abundance tables; for example, some genera and features unable to distinguish samples are excluded (Casimiro‐Soriguer et al, 2019; Walker et al, 2018), or dimension reduction methods, such as principal component analysis (PCA), are used.…”

Section: Subway Citizenmentioning

confidence: 99%

Where environmental microbiome meets its host: Subway and passenger microbiome relationships

Peimbert

Alcaraz

Vega³

2022

Molecular Ecology

View full text Add to dashboard Cite

Subways are urban transport systems with high capacity. Every day around the world, there are more than 150 million subway passengers. Since 2013, thousands of microbiome samples from various subways worldwide have been sequenced. Skin bacteria and environmental organisms dominate the subway microbiomes. The literature has revealed common bacterial groups in subway systems; even so, it is possible to identify cities by their microbiome. Low frequency bacteria are responsible for specific bacterial fingerprints of each subway system. Furthermore, daily subway commuters leave their microbial clouds and interact with other passengers. Microbial exchange is quite fast; the hand microbiome changes within minutes, and after cleaning the handrails, the bacteria are re-established within minutes. To investigate new taxa and metabolic pathways of subway microbial communities, several high-quality metagenomic-assembled genomes (MAG) have been described. Subways are harsh environments unfavorable for microorganism growth. However, recent studies have observed a wide diversity of viable and metabolically active bacteria. Understanding which bacteria are living, dormant, or dead allows us to propose realistic ecological interactions. Questions regarding the relationship between humans and the subway microbiome, particularly the microbiome effects on personal and public health, remain unanswered. This review summarizes our knowledge of subway microbiomes and their relationship with passenger microbiomes.

show abstract

Section: Subway Citizenmentioning

confidence: 99%

Where environmental microbiome meets its host: Subway and passenger microbiome relationships

Peimbert

Alcaraz

Vega³

2022

Molecular Ecology

View full text Add to dashboard Cite

show abstract

“…Avoiding the pooling of data from different studies can bypass the study-specific effect issue, though greatly reduces the statistical power with negative effects on the reliability of the outcome. Additionally, microbiome data commonly suffer from imbalanced sample distribution (Khan and Kelly, 2020 ; Poore et al, 2020 ; Anyaso-Samuel et al, 2021 ). Particularly in (binary) classification applications, it is commonly the case that one class is overrepresented (majority class) while the other is underrepresented (minority class).…”

Section: Introductionmentioning

confidence: 99%

Prediction of Smoking Habits From Class-Imbalanced Saliva Microbiome Data Using Data Augmentation and Machine Learning

et al. 2022

View full text Add to dashboard Cite

Human microbiome research is moving from characterization and association studies to translational applications in medical research, clinical diagnostics, and others. One of these applications is the prediction of human traits, where machine learning (ML) methods are often employed, but face practical challenges. Class imbalance in available microbiome data is one of the major problems, which, if unaccounted for, leads to spurious prediction accuracies and limits the classifier's generalization. Here, we investigated the predictability of smoking habits from class-imbalanced saliva microbiome data by combining data augmentation techniques to account for class imbalance with ML methods for prediction. We collected publicly available saliva 16S rRNA gene sequencing data and smoking habit metadata demonstrating a serious class imbalance problem, i.e., 175 current vs. 1,070 non-current smokers. Three data augmentation techniques (synthetic minority over-sampling technique, adaptive synthetic, and tree-based associative data augmentation) were applied together with seven ML methods: logistic regression, k-nearest neighbors, support vector machine with linear and radial kernels, decision trees, random forest, and extreme gradient boosting. K-fold nested cross-validation was used with the different augmented data types and baseline non-augmented data to validate the prediction outcome. Combining data augmentation with ML generally outperformed baseline methods in our dataset. The final prediction model combined tree-based associative data augmentation and support vector machine with linear kernel, and achieved a classification performance expressed as Matthews correlation coefficient of 0.36 and AUC of 0.81. Our method successfully addresses the problem of class imbalance in microbiome data for reliable prediction of smoking habits.

show abstract

“…This challenge exhorts a comprehensive examination of anti‐microbial resistance (AMR) patterns in this vast metagenomic surveillance data, attracting dedicated researchers striving to uncover these complex interactions. Unlike previous studies that have predominantly focused on geolocation prediction 4–6 or spatial modelling 7,8 of such patterns, our research forges a new path by delving into the uncharted territory of bacteriophages' role in orchestrating AMR dissemination. Bacteriophages, also known as phages, are viruses that prey on bacteria 9 .…”

Section: Introductionmentioning

confidence: 99%

Multiblock partial least squares and rank aggregation: Applications to detection of bacteriophages associated with antimicrobial resistance in the presence of potential confounding factors

Sarkar,

Anyaso‐Samuel,

Qiu

et al. 2024

Statistics in Medicine

Self Cite

View full text Add to dashboard Cite

Urban environments, characterized by bustling mass transit systems and high population density, host a complex web of microorganisms that impact microbial interactions. These urban microbiomes, influenced by diverse demographics and constant human movement, are vital for understanding microbial dynamics. We explore urban metagenomics, utilizing an extensive dataset from the Metagenomics & Metadesign of Subways & Urban Biomes (MetaSUB) consortium, and investigate antimicrobial resistance (AMR) patterns. In this pioneering research, we delve into the role of bacteriophages, or “phages”–viruses that prey on bacteria and can facilitate the exchange of antibiotic resistance genes (ARGs) through mechanisms like horizontal gene transfer (HGT). Despite their potential significance, existing literature lacks a consensus on their significance in ARG dissemination. We argue that they are an important consideration. We uncover that environmental variables, such as those on climate, demographics, and landscape, can obscure phage‐resistome relationships. We adjust for these potential confounders and clarify these relationships across specific and overall antibiotic classes with precision, identifying several key phages. Leveraging machine learning tools and validating findings through clinical literature, we uncover novel associations, adding valuable insights to our comprehension of AMR development.

show abstract

Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier

Cited by 8 publications

References 38 publications

Where environmental microbiome meets its host: Subway and passenger microbiome relationships

Where environmental microbiome meets its host: Subway and passenger microbiome relationships

Prediction of Smoking Habits From Class-Imbalanced Saliva Microbiome Data Using Data Augmentation and Machine Learning

Multiblock partial least squares and rank aggregation: Applications to detection of bacteriophages associated with antimicrobial resistance in the presence of potential confounding factors

Contact Info

Product

Resources

About