2016
DOI: 10.1186/s13059-016-1112-z
|View full text |Cite
|
Sign up to set email alerts
|

DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles

Abstract: Understanding the link between non-coding sequence variants, identified in genome-wide association studies, and the pathophysiology of complex diseases remains challenging due to a lack of annotations in non-coding regions. To overcome this, we developed DIVAN, a novel feature selection and ensemble learning framework, which identifies disease-specific risk variants by leveraging a comprehensive collection of genome-wide epigenomic profiles across cell types and factors, along with other static genomic feature… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
77
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 75 publications
(79 citation statements)
references
References 39 publications
1
77
0
Order By: Relevance
“…However, IW-Scoring still allows for the functional variants associated with specific tissues, cells and features to be identified through the regulatory annotation module. This is currently lacking in many other methods, although some algorithms have chosen to focus on the identification of disease/tissue specific risk variants recently [22, 41]. Compared to most available methods, we believe our approach is optimally balanced between summarised and detailed evidences for the diverse range of users.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, IW-Scoring still allows for the functional variants associated with specific tissues, cells and features to be identified through the regulatory annotation module. This is currently lacking in many other methods, although some algorithms have chosen to focus on the identification of disease/tissue specific risk variants recently [22, 41]. Compared to most available methods, we believe our approach is optimally balanced between summarised and detailed evidences for the diverse range of users.…”
Section: Discussionmentioning
confidence: 99%
“…Via a vigorous weight learning process, strong weights were assigned to the block of closely correlated scores (Eigen, DeepSEA, FATHMM noncoding, ReMM and CADD), and the derived IW-Scoring significantly outperformed individual constituent scores (including Eigen and Eigen-PC) across various data sets, demonstrating the accuracy and validity of our approach. Such ensemble based approach with different estimated weights has been shown to perform better than any single component classifier [26], and has been widely used in various bioinformatics problems [41, 42]. The weighted integration technique based on the eigendecomposition of the covariance matrix also offers the flexibility to incorporate any other correlated genome-wide functional scores/features into the integrative scores.…”
Section: Discussionmentioning
confidence: 99%
“…Such variants do not change the functionality of a gene product but instead are thought to affect transcript levels through distant gene regulation. In support of this concept, many of these genomic sites display transcription regulatory potential in disease-relevant cell-types, as judged from their epigenetic signatures (1000Genomes Project Consortium et al, 2010Chen et al, 2016;Maurano et al, 2012). Gene regulation over distance requires chromatin looping to bring distal regulatory DNA elements in close proximity to target genes.…”
Section: Hi-c To Link Disease Variants To Genesmentioning
confidence: 99%
“…However, the identification of target genes and underlying mechanisms remains challenging. Indeed, more than 90% of disease-associated variation resides in noncoding DNA (1000Genomes Project Consortium et al, 2010Chen et al, 2016;Maurano et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“…Feature selection and outlier removal were employed to achieve the best performance. The optimal feature set was selected depend on the largest area under the receiver operating characteristic curve (ROC-AUC) value as described in previous study [27]. Briefly, the confidence of each feature was measured by p values based on Wilcoxon rank sum test.…”
Section: Feature Selection and Outlier Removalmentioning
confidence: 99%