Autoimmune diseases are chronic, multifactorial conditions. Through machine learning (ML), a branch of the wider field of artificial intelligence, it is possible to extract patterns within patient data, and exploit these patterns to predict patient outcomes for improved clinical management. Here, we surveyed the use of ML methods to address clinical problems in autoimmune disease. A systematic review was conducted using MEDLINE, embase and computers and applied sciences complete databases. Relevant papers included "machine learning" or "artificial intelligence" and the autoimmune diseases search term(s) in their title, abstract or key words. Exclusion criteria: studies not written in English, no real human patient data included, publication prior to 2001, studies that were not peer reviewed, non-autoimmune disease comorbidity research and review papers. 169 (of 702) studies met the criteria for inclusion. Support vector machines and random forests were the most popular ML methods used. ML models using data on multiple sclerosis, rheumatoid arthritis and inflammatory bowel disease were most common. A small proportion of studies (7.7% or 13/169) combined different data types in the modelling process. Cross-validation, combined with a separate testing set for more robust model evaluation occurred in 8.3% of papers (14/169). The field may benefit from adopting a best practice of validation, cross-validation and independent testing of ML models. Many models achieved good predictive results in simple scenarios (e.g. classification of cases and controls). Progression to more complex predictive models may be achievable in future through integration of multiple data types.
OBJECTIVES: Monogenic inflammatory bowel disease (IBD) comprises rare Mendelian causes of gut inflammation, often presenting in infants with severe and atypical disease. This study aimed to identify clinically relevant variants within 68 monogenic IBD genes in an unselected pediatric IBD cohort. METHODS: Whole exome sequencing was performed on patients with pediatric-onset disease. Variants fulfilling the American College of Medical Genetics criteria as “pathogenic” or “likely pathogenic” were assessed against phenotype at diagnosis and follow-up. Individual patient variants were assessed and processed to generate a per-gene, per-individual, deleteriousness score. RESULTS: Four hundred one patients were included, and the median age of disease-onset was 11.92 years. In total, 11.5% of patients harbored a monogenic variant. TRIM22-related disease was implicated in 5 patients. A pathogenic mutation in the Wiskott-Aldrich syndrome (WAS) gene was confirmed in 2 male children with severe pancolonic inflammation and primary sclerosing cholangitis. In total, 7.3% of patients with Crohn's disease had apparent autosomal recessive, monogenic NOD2-related disease. Compared with non-NOD2 Crohn's disease, these patients had a marked stricturing phenotype (odds ratio 11.52, significant after correction for disease location) and had undergone significantly more intestinal resections (odds ratio 10.75). Variants in ADA, FERMT1, and LRBA did not meet the criteria for monogenic disease in any patients; however, case-control analysis of mutation burden significantly implicated these genes in disease etiology. DISCUSSION: Routine whole exome sequencing in pediatric patients with IBD results in a precise molecular diagnosis for a subset of patients with IBD, providing the opportunity to personalize therapy. NOD2 status informs risk of stricturing disease requiring surgery, allowing clinicians to direct prognosis and intervention.
Background Inflammatory bowel disease may arise with inadequate immune response to intestinal bacteria. NOD2 is an established gene in Crohn’s disease pathogenesis, with deleterious variation associated with reduced NFKB signaling. We hypothesized that deleterious variation across the NOD2 signaling pathway impacts on transcription. Methods Treatment-naïve pediatric inflammatory bowel disease patients had ileal biopsies for targeted autoimmune RNA-sequencing and blood for whole exome sequencing collected at diagnostic endoscopy. Utilizing GenePy, a per-individual, per-gene score, genes within the NOD signaling pathway were assigned a quantitative score representing total variant burden. Where multiple genes formed complexes, GenePy scores were summed to create a “complex” score. Normalized transcript expression of 95 genes within this pathway was retrieved. Regression analysis was performed to determine the impact of genomic variation on gene transcription. Results Thirty-nine patients were included. Limited clustering of patients based on NOD signaling transcripts was related to underlying genomic variation. Patients harboring deleterious variation in NOD2 had reduced NOD2 (β = -0.702, P = 4.3 × 10-5) and increased NFKBIA (β = 0.486, P = .001), reflecting reduced NFKB signal activation. Deleterious variation in the NOD2-RIPK2 complex was associated with increased NLRP3 (β = 0.8, P = 3.1475 × 10-8) and TXN (β = -0.417, P = 8.4 × 10-5) transcription, components of the NLRP3 inflammasome. Deleterious variation in the TAK1-TAB complex resulted in reduced MAPK14 transcription (β = -0.677, P = 1.7 × 10-5), a key signal transduction protein in the NOD2 signaling cascade and increased IFNA1 (β = 0.479, P = .001), indicating reduced transcription of NFKB activators and alternative interferon transcription in these patients. Conclusions Data integration identified perturbation of NOD2 signaling transcription correlated with genomic variation. A hypoimmune NFKB signaling transcription response was observed. Alternative inflammatory pathways were activated and may represent therapeutic targets in specific patients.
Background/Aims Crohn’s disease (CD) arises through host-environment interaction. Abnormal gene expression results from disturbed pathway activation or response to bacteria. We aimed to determine activated pathways and driving cell types in paediatric CD. Methods - We employed contemporary targeted autoimmune RNA sequencing, in parallel to single-cell sequencing, to ileal tissue derived from paediatric CD and controls. Weighted-gene-co-expression-network-analysis (WGCNA) was performed and differentially expressed genes (DEGs) were determined. We integrated clinical data to determine co-expression modules associated with outcomes. Results - Twenty-seven treatment-naive CD (TN-CD), 26 established-CD patients and 17 controls were included. WGCNA revealed a 31-gene signature characterising TN-CD patients, but not established-CD, or controls. The CSF3R gene is a hub within this module and is key in neutrophil expansion and differentiation. Antimicrobial genes including S100A12 and the calprotectin subunit S100A9 were significantly upregulated in TN CD compared to controls (p=2.61x10 -15 and p=9.13x10 -14, respectively) and established-CD (both p=0.0055). Gene-enrichment analysis confirmed upregulation of the IL17-, NOD- and Oncostatin-M-signalling pathways in TN-CD patients, identified in both WGCNA and DEG analyses. An upregulated gene-signature was enriched for transcripts promoting Th17-cell differentiation and correlated with prolonged time to relapse (correlation-coefficient-0.36, p=0.07).Single-cell sequencing of TN-CD patients identified specialised epithelial cells driving differential expression of S100A9. Cell groups, determined by single-cell gene-expression, demonstrated enrichment of IL17-signalling in monocytes and epithelial cells. Conclusion - Ileal tissue from treatment naïve paediatric patients is significantly upregulated for genes driving IL17-, NOD- and Oncostatin-M-signalling. This signal is driven by a distinct subset of epithelial cells expressing antimicrobial gene transcripts.
The precise role of periostin, an extra-cellular matrix protein, in inflammatory bowel disease (IBD) is unclear. Here, we investigated periostin in paediatric IBD including its relationship with disease activity, clinical outcomes, genomic variation and expression in the colonic tissue. Plasma periostin was analysed using ELISA in 144 paediatric patients and 38 controls. Plasma levels were assessed against validated disease activity indices in IBD and clinical outcomes. An immuno-fluorescence for periostin and detailed isoform-expression analysis in the colonic tissue was performed in 23 individuals. We integrated a whole-gene based burden metric ‘GenePy’ to assess the impact of variation in POSTN and 23 other genes functionally connected to periostin. We found that plasma periostin levels were significantly increased during remission compared to active Crohn’s disease. The immuno-fluorescence analysis demonstrated enhanced peri-cryptal ring patterns in patients compared to controls, present throughout inflamed, as well as macroscopically non-inflamed colonic tissue. Interestingly, the pattern of isoforms remained unchanged during bowel inflammation compared to healthy controls. In addition to its role during the inflammatory processes in IBD, periostin may have an additional prominent role in mucosal repair. Additional studies will be necessary to understand its role in the pathogenesis, repair and fibrosis in IBD.
Exome sequencing analysis Genomic DNA was extracted from peripheral venous blood or saliva using the salting out method 1. Fragmented DNA was then subjected to adaptor ligation, exome library enrichment performed using Agilent SureSelect Human All Exon capture kit (V5 & V6) and sequenced on Illumina HiSeq platforms. Alignment against the human genome (hg38) was performed using Burrows-Wheeler Aligner (BWA) 2 , variants called using Genome Analysis Toolkit (GATK v3.6) and ANNOVAR for variant annotation 3. NOD2 variants were reported in line with previously published data. Briefly, variants with a CADD score of >15 and a minor allele frequency of <0.01/novel, or variants reported as pathogenic in the CLINVAR database or human genetic mutation database were reported 4. Variants were categorised in line with the American College of Medical Genetics (ACGM) guidance to remove 'benign' variants and identify 'pathogenic' and 'likely pathogenic' variants 5. Statistical analysis and data visualisation Statistical analysis was performed using the GraphPad Prism software, version 7. Cytokine induction between the patient cohort and controls were compared using unpaired t-tests in a two-tailed manner. For application of hierarchical clustering and generation of radar plots, raw cytokine data were normalised using RobustScaler and StandardScaler respectively, embedded in the python scikit-learn package (version 0.19.01). Normalisation of raw data is a common requirement for machine learning applications as these programmes are designed on the assumption that the data values vary on comparable scales.
Background Inflammatory bowel disease (IBD) is a gastrointestinal chronic disease with an unpredictable disease course. Computational methods such as machine learning (ML) have the potential to stratify IBD patients for the provision of individualized care. The use of ML methods for IBD was surveyed, with an additional focus on how the field has changed over time. Methods On May 6, 2021, a systematic review was conducted through a search of MEDLINE and Embase databases, with the search structure (“machine learning” OR “artificial intelligence”) AND (“Crohn* Disease” OR “Ulcerative Colitis” OR “Inflammatory Bowel Disease”). Exclusion criteria included studies not written in English, no human patient data, publication before 2001, studies that were not peer reviewed, nonautoimmune disease comorbidity research, and record types that were not primary research. Results Seventy-eight (of 409) records met the inclusion criteria. Random forest methods were most prevalent, and there was an increase in neural networks, mainly applied to imaging data sets. The main applications of ML to clinical tasks were diagnosis (18 of 78), disease course (22 of 78), and disease severity (16 of 78). The median sample size was 263. Clinical and microbiome-related data sets were most popular. Five percent of studies used an external data set after training and testing for additional model validation. Discussion Availability of longitudinal and deep phenotyping data could lead to better modeling. Machine learning pipelines that consider imbalanced data and that feature selection only on training data will generate more generalizable models. Machine learning models are increasingly being applied to more complex clinical tasks for specific phenotypes, indicating progress towards personalized medicine for IBD.
Background The rise of ‘big data’ in inflammatory bowel disease (IBD) presents an opportunity to improve understanding of pathogenesis and unpick the molecular complexity of this heterogenous condition. Personalisation of IBD management relies on predicting outcomes, response to therapy and prevention of complications. Here, we present results outlining subgrouping of patients and outcome prediction using multiomic/clinical data. Methods Using whole exome sequencing from 1100 patients in the Southampton IBD cohort, including 650 paediatric cases, we have performed iterative studies focused on 1) Impact of genomic variation across the NOD-signaling pathway measured by perturbation of transcription across multiple genes, 2) Development of NOD2 as a genomic biomarker of stricturing Crohn’s disease (CD), 3) Utilising machine learning and genomic data to develop disease classification models. These data utilise GenePy, a tool developed in house that summarises genomic variation to give a per individual, per gene deleteriousness metric. Results Within the NOD-signaling pathway patients harbouring deleterious variation in NOD2 had reduced NOD2 expression and increased NFKBIA expression, reflecting reduced NFKB signaling, figure 1A. We report deleterious variation in several key complexes including NOD2-RIPK2 and TAK1-TAB, resulted in reduced transcription of NFKB activators and alternative inflammatory pathway activation, figure 1C-D. Utilising genomic data we constructed a NOD2 prediction model for stricturing disease in Crohn’s disease; 56.7% of patients in the ‘high-risk group’ had stricturing behaviour, whilst in the low-risk group only 21.4% had strictures. Addition of terminal ileal (TI) disease to the NOD2 risk groups significantly improved prediction, figure 2A. Using survival modelling, high-risk group paediatric patients presenting with TI disease had a HR of 4.89 (P = 2.3×10-5) compared with the low-risk group patients without TI disease, figure 2B. Finally, we used supervised machine learning of genomic data to classify patients with CD or ulcerative colitis. We employed different gene lists and assessed how accurately we could assign patients to their diagnosis. An autoimmune gene panel produced the best model (AUROC 0.68), compared to an IBD panel (AUCROC 0.61). NOD2 was the most discriminating gene in all the gene panels. Conclusion These iterative projects demonstrate the utility of integrating genomic and clinical data to improve the subtyping of patients with IBD and provide disease prediction models. Future work will include analyses of additional inflammatory pathways and targeting different clinical outcomes. We hope clinical translation of these findings will be a step-change in precision medicine for IBD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.