2020
DOI: 10.3389/fmolb.2020.610845
|View full text |Cite
|
Sign up to set email alerts
|

Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction

Abstract: Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different typ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 29 publications
1
12
0
Order By: Relevance
“…To reduce the number of features analysed by the classifiers a two-step feature selection method was applied: firstly, k-mers with a p-value greater than 0.05 according to the chi-square test were discarded [ 45 , 46 ] and then those remaining were used as input to an Extra Tree Classifier. K-mers with a Gini feature importance above the overall mean were selected [ 47 , 48 ].…”
Section: Resultsmentioning
confidence: 99%
“…To reduce the number of features analysed by the classifiers a two-step feature selection method was applied: firstly, k-mers with a p-value greater than 0.05 according to the chi-square test were discarded [ 45 , 46 ] and then those remaining were used as input to an Extra Tree Classifier. K-mers with a Gini feature importance above the overall mean were selected [ 47 , 48 ].…”
Section: Resultsmentioning
confidence: 99%
“…We found that the microbiome data transformation had an inconsistent effect on a model performance, as it improved the AUC of some models (e.g., CF using genus- and family-level data, SVM using family-level data) while it decreased the AUC of other models (e.g., RRF using genus- and family-level data, SVM using genus-level data). Previous studies on gut microbiome data emphasized the importance of data transformation due to the compositionality of the microbiome data (28, 30, 32). However, they also reported that the performance of tree-based algorithms (i.e., random forest and XGBoost) was not significantly affected by data transformation (32).…”
Section: Discussionmentioning
confidence: 99%
“…Previous studies on gut microbiome data emphasized the importance of data transformation due to the compositionality of the microbiome data (28,30,32). However, they also reported that the performance of tree-based algorithms (i.e., random forest and XGBoost) was not significantly affected by data transformation (32).…”
Section: Data Transformation Did Not Have a Notable Effect On The Per...mentioning
confidence: 99%
See 1 more Smart Citation
“…In this article, we propose a standardized approach for evaluating the performance and generalizability of data processing pipelines and ML models with microbiome data to classify patients with IBD. Previous microbiome ML benchmarking studies focused on performance of various combinations of model type, normalization or transformation, and microbiome compositional features using variations of fivefold cross validation (Song et al, 2020;Topçuoğlu et al, 2020). Five-fold cross validation fails to assess the generalizability to new, unseen sample batches as each split potentially contains samples from all batches present in the dataset.…”
Section: Introductionmentioning
confidence: 99%