2020
DOI: 10.1093/nar/gkaa1137
|View full text |Cite
|
Sign up to set email alerts
|

Predicting regulatory variants using a dense epigenomic mapped CNN model elucidated the molecular basis of trait-tissue associations

Abstract: Assessing the causal tissues of human complex diseases is important for the prioritization of trait-associated genetic variants. Yet, the biological underpinnings of trait-associated variants are extremely difficult to infer due to statistical noise in genome-wide association studies (GWAS), and because >90% of genetic variants from GWAS are located in non-coding regions. Here, we collected the largest human epigenomic map from ENCODE and Roadmap consortia and implemented a deep-learning-based convoluti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 18 publications
(16 citation statements)
references
References 63 publications
(118 reference statements)
0
16
0
Order By: Relevance
“…All these facts considerably decrease the efficacy of the available methods for TFBS recognition, most of which are based on the PWM model, which oversimplifies the mechanisms underlying TF–DNA interaction [ 66 , 68 , 69 , 70 ]. Development of new generation bioinformatics approaches relying on machine learning and neural networks raises the hope for more efficient and accurate recognition of both the TFBSs and rSNPs in the genomes [ 190 , 191 , 192 , 193 , 194 , 195 ].…”
Section: Discussionmentioning
confidence: 99%
“…All these facts considerably decrease the efficacy of the available methods for TFBS recognition, most of which are based on the PWM model, which oversimplifies the mechanisms underlying TF–DNA interaction [ 66 , 68 , 69 , 70 ]. Development of new generation bioinformatics approaches relying on machine learning and neural networks raises the hope for more efficient and accurate recognition of both the TFBSs and rSNPs in the genomes [ 190 , 191 , 192 , 193 , 194 , 195 ].…”
Section: Discussionmentioning
confidence: 99%
“…This resulted in a median AUPRC increase of 0.502 and 0.321 in DNase-seq and histone mark assays, respectively. More details of CNN model construction and performance for each feature can be found in our recent work ( 26 ).…”
Section: Methodsmentioning
confidence: 99%
“…For each variant, DeepFun considers its neighboring 1000 bp region for context information, and then predicts the active (accessibility or binding) probability of sequence(s) containing either reference allele or alternative allele, respectively. To evaluate the impact of variant, we implemented the previously defined SNP Activity Difference (SAD) or relative log fold change of odds (log-odds) difference between the two alleles ( 26 ).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations