Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome’s composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
BackgroundInflammatory bowel disease (IBD) patients wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome’s composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools.ResultsWe collected fifteen 16S rRNA microbiome datasets (7707 samples) from North America to benchmark combinations of gut microbiome features, data normalization methods, batch effect reduction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with four binary classification metrics following leave-one dataset-out cross validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features obtained from QIIME2 lead to better classification of samples from IBD patients than inferred functional features obtained from PICRUSt2. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Prior to training a non-linear machine learning model on taxonomic features, it is important to apply a compositional normalization method and remove batch effects with the naive zero-centering method. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics.ConclusionsThese findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
The prevalence of inflammatory bowel disease (IBD) is increasing throughout the developed world. For the newly diagnosed, the time between the appearance of symptoms and diagnosis can take months, involving invasive procedures. There is an urgent need to develop a simple, low cost, accurate and non-invasive diagnostic test. With decreasing costs of next-generation sequencing, many studies have compared IBD gut microbiomes to healthy controls, successfully identifying bacterial biomarkers for IBD. Unfortunately, a majority of these studies utilize machine learning and statistical methods on either single or low-sample size datasets. This results in the creation of disease classification models that have a high level of overfitting and therefore minimal clinical application to new patient cohorts. There are several data preprocessing methods available for data normalization and reduction of cohort specific signals (batch reduction) which can address this lack of cross-dataset performance. With an abundance of potential methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (combination of data preprocessing and model) for microbiome-based IBD diagnostic tools. We used a collection of 12 IBD-associated North American microbiome datasets (~4000 samples) to benchmark several machine learning pipelines. Raw sequencing data was processed, collapsed at the OTU or Genus level and merged using QIIME2. Datasets were then normalized using either sum-scaling or log based methods and batch reduction was performed using either zero-centering or Empirical Bayes’ approaches. Performance of pipelines was evaluated using binary accuracy, AUC, F1 metric and MCC score. Generalizability of pipelines was evaluated using leave one out cross validation, where data from one study was left out of the training set and tested upon. The best performing and most generalizable pipeline included a Random Forest model paired with centered log ratio based normalization and batch reduction via an Empirical Bayes’ based approach. This combination, along with others, showed equivalent or higher performance to that of more complex models involving deep neural networks (DNNs). In addition to benchmarking our pipelines, we also explore their limitations, such as the tendency of zero-centered batch reduction to rely on balanced data as input or the tendency of Empirical Bayes’ based methods to introduce artificial signals into data, evidencing certain methods as poor tools for clinical use. To our knowledge, this is the first comprehensive benchmark of data preprocessing and machine learning methods for microbiome-based disease classification of IBD. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.