2021
DOI: 10.1093/nargab/lqab065
|View full text |Cite
|
Sign up to set email alerts
|

DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies

Abstract: Deep learning has revolutionized data science in many fields by greatly improving prediction performances in comparison to conventional approaches. Recently, explainable artificial intelligence has emerged as an area of research that goes beyond pure prediction improvement by extracting knowledge from deep learning methodologies through the interpretation of their results. We investigate such explanations to explore the genetic architectures of phenotypes in genome-wide association studies. Instead of testing … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(25 citation statements)
references
References 96 publications
0
25
0
Order By: Relevance
“…Machine learning algorithms, from simple general linear regression [71], PCA [22,60], to random forest [72], extreme gradient boosting [73], as well as neural networks [74], have enabled us to capture the systematic signatures of biological or genetic patterns from genomic samples, allowing for the association of genes to phenotypes/diseases and facilitating molecular-based medical applications [75][76][77]. KLFDAPC represents a new addition to the population genomics toolbox but it is also potentially applicable to other Omics data throughout the biological sciences, including applications in medicine and agriculture.…”
Section: Discussionmentioning
confidence: 99%
“…Machine learning algorithms, from simple general linear regression [71], PCA [22,60], to random forest [72], extreme gradient boosting [73], as well as neural networks [74], have enabled us to capture the systematic signatures of biological or genetic patterns from genomic samples, allowing for the association of genes to phenotypes/diseases and facilitating molecular-based medical applications [75][76][77]. KLFDAPC represents a new addition to the population genomics toolbox but it is also potentially applicable to other Omics data throughout the biological sciences, including applications in medicine and agriculture.…”
Section: Discussionmentioning
confidence: 99%
“…Thus far, studies that examined MTX response in RA patients mainly focussed on statistical association of non-genetic factors 74 and/or genetic variants employing either the very popular genome-wide association studies (GWAS) 30 or gene specific association analyses interrogating specific genes in the MTX/RA pathways 31,32 with MTX response. This popular classical statistical approach employs Raw P-Value Thresholding (RPVT), 75 where a P-value is assigned to each SNP; and the inferred confidence of a variant in accounting for the phenotype in the dataset is assessed by its statistical significance via comparison to a predefined threshold that considers a balance between Type I and Type 2 errors. However, since statistics merely derive population inference of a relationship between the data and the outcome variable from a sample 76 ; and its main purpose is not to make prediction of a future dataset, statistically significant association in one dataset are not necessarily predictive of the outcome in a future dataset.…”
Section: Articlesmentioning
confidence: 99%
“…4À6 Furthermore, as statistical approach evaluates individual SNP independently and in parallel, it does not consider potential higher order interactions amongst SNPs 77,78 and is less able to identify variants with small effects because of statistical power constraints due to excessive multiple testing. 75 Additionally, as classical statistical approach was originally designed for datasets with limited dependent and independent variables, statistical inferences are less precise with large number of variables as observed in GWAS studies since the possible associations among the many variables also increase drastically leading to more complex relationships. 76 On the other hand, the Machine Learning (ML) approach employed in this study is particularly suited for dealing with rich, unwieldy, 'wide' data where the independent variables (e.g.…”
Section: Articlesmentioning
confidence: 99%
“…By using the available genomic sequences from different varieties of a certain crop species, these deep learning-based prediction methods can identify SNPs associated with the trait of interest. The machine learning algorithms are first trained with a combination of data including genotypic, phenotypic, agronomic practices and environmental data before it is used on a test dataset for predicting SNPs (Wang et al, 2020;Mieth et al, 2021). This is just one of the applications of AI and deep learning to accelerate knowledge discovery.…”
Section: Interdisciplinary Approaches For Hemp Biologymentioning
confidence: 99%