2022
DOI: 10.3389/fgene.2022.784397
|View full text |Cite
|
Sign up to set email alerts
|

Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease

Abstract: Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome’s composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various method… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(30 citation statements)
references
References 107 publications
0
25
0
Order By: Relevance
“…Others have selected more bacterial markers as biomarkers for IBD determination using various feature selection methods and ML models. [59][60][61] Manandhar et al 53 Metagenomic sequencing, which has a higher taxonomy resolution that enables better identification of specific bacterial species or strains related to disease development, is increasingly applied for the discovery of microbial markers. Franzosa et al 64 genus data (AUC = 0.869) achieved higher pediatric UC prediction performance than shotgun species data (AUC = 0.763) and pathway data (AUC = 0.764).…”
Section: Disease Diagnosismentioning
confidence: 99%
See 1 more Smart Citation
“…Others have selected more bacterial markers as biomarkers for IBD determination using various feature selection methods and ML models. [59][60][61] Manandhar et al 53 Metagenomic sequencing, which has a higher taxonomy resolution that enables better identification of specific bacterial species or strains related to disease development, is increasingly applied for the discovery of microbial markers. Franzosa et al 64 genus data (AUC = 0.869) achieved higher pediatric UC prediction performance than shotgun species data (AUC = 0.763) and pathway data (AUC = 0.764).…”
Section: Disease Diagnosismentioning
confidence: 99%
“…Others have selected more bacterial markers as biomarkers for IBD determination using various feature selection methods and ML models. 59 , 60 , 61 Manandhar et al. 53 selected 50 fecal bacterial taxa for disease diagnosis in a large American cohort.…”
Section: Role Of Gut Microbiome In Ibd Diagnosismentioning
confidence: 99%
“…These methods are univariate and were originally developed for gene expression data from microarray or RNA-sequencing. They have been used extensively in microbiome studies [ 30–32 , 46 , 47 ] even though they would require further developments to be adapted to the inherent characteristics of microbiome data. These methods’ limitations include the inability to deal with non-Gaussian distribution, small sample sizes and dependence between microbial variables.…”
Section: Methodsmentioning
confidence: 99%
“…We assess the performance of PLSDA-batch in extensive simulation studies and three case studies that investigate microbial communities in sponge tissues, anaerobic digestion conditions and diet types in mice. We compare the efficiency of our approaches in removing batch effects and uncovering treatment effects with popular linear methods that have been previously applied in microbial studies [ 30–32 ], such as ComBat and removeBatchEffect. As our approach shares some similarities with Surrogate Variable Analysis (SVA), besides the fact that it accounts, rather than corrects for batch effects, we include some comparisons in the simulation studies.…”
Section: Introductionmentioning
confidence: 99%
“…Model performance can also be affected by microbiome data preprocessing, such as data normalization ( 28 ). The latter is commonly carried out to account for potential differences in sample library sizes; thus, its effect on prediction accuracy needs to be assessed ( 31 33 ).…”
Section: Introductionmentioning
confidence: 99%