2022
DOI: 10.1167/tvst.11.4.16
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements

Abstract: Purpose Prior studies have demonstrated the significance of specific cis -regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis -regulatory variants. Methods We used human re… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 64 publications
(78 reference statements)
0
8
0
Order By: Relevance
“…Therefore, we applied a gapped k-mer SVM approach (gkm-SVM) to our datasets that has been optimized to detect k-mers of similar length to typical TF-binding motifs ( Ghandi et al, 2016 ). Support vector machine (SVM) algorithms have been utilized in a variety of contexts to perform classification of DNA sequences in a supervised manner ( Barozzi et al, 2014 ; Ghandi et al, 2016 ; Van den Bosch et al, 2022 ). It should be noted that k-mers identified by gkm-SVM are simply DNA sequences of k length that can discriminate two sets of input sequences, and do not necessarily correspond to TF-binding sites per se.…”
Section: Resultsmentioning
confidence: 99%
“…Therefore, we applied a gapped k-mer SVM approach (gkm-SVM) to our datasets that has been optimized to detect k-mers of similar length to typical TF-binding motifs ( Ghandi et al, 2016 ). Support vector machine (SVM) algorithms have been utilized in a variety of contexts to perform classification of DNA sequences in a supervised manner ( Barozzi et al, 2014 ; Ghandi et al, 2016 ; Van den Bosch et al, 2022 ). It should be noted that k-mers identified by gkm-SVM are simply DNA sequences of k length that can discriminate two sets of input sequences, and do not necessarily correspond to TF-binding sites per se.…”
Section: Resultsmentioning
confidence: 99%
“…Machine learning presents an opportunity to train better models of cis-regulatory grammars, due to its power to discover predictive features in high dimensional data. Deep neural network models trained on large epigenomic datasets often predict TF binding and chromatin accessibility with high accuracy, and these models have revealed important contextual features of local DNA sequence that determine TF binding (25)(26)(27)(28)(29)(30)(31)(32). However, models trained on massively parallel reporter gene assays (MPRAs) to predict CRE activity (20,(33)(34)(35)(36)(37)(38)(39)(40) often perform less well than binding models, likely because the cis-regulatory grammars that govern activity depend on additional higher-order interactions between bound TFs and their associated co-factors (2,(4)(5)(6)41).…”
Section: Introductionmentioning
confidence: 99%
“…One approach to address this challenge is implementing predictive models to identify variants that create or disrupt TF binding sites (TFBS). [23][24][25] Large-scale gapped k-mer (LS-GKM) support vector machine (SVM) predictive models can be trained to identify TFBS by using in vitro or in vivo DNA-binding data, such as chromatin immunoprecipitation followed by sequencing (ChIP-seq). LS-GKM-SVM models outperform traditional approaches, such as position weight matrix (PWM)-based methods, by considering complex sequence features like dinucleotide interactions, longer/gapped k-mers, and intracellular patterns.…”
Section: Introductionmentioning
confidence: 99%
“…[26][27][28][29] LS-GKM-SVM predictive models can be trained with ChIP-seq data from specific cell lines or tissue to integrate relevant epigenomic and regulatory context. 23 In this work, we present an integrative approach to prioritize functional non-coding variants that can contribute to the biology of CVDs. Using publicly accessible data from the GWAS catalog 30 , GTEx Portal 31 , ENCODE 32 , ChIP-Atlas 33 , and Remap 34 , we compiled a list of CVD-associated SNPs linked with a differentially expressed gene in cardiac tissue.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation