2019
DOI: 10.1093/bioinformatics/btz322
|View full text |Cite
|
Sign up to set email alerts
|

GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs

Abstract: Summary Support Vector Machines with gapped k-mer kernels (gkm-SVMs) have been used to learn predictive models of regulatory DNA sequence. However, interpreting predictive sequence patterns learned by gkm-SVMs can be challenging. Existing interpretation methods such as deltaSVM, in-silico mutagenesis (ISM) or SHAP either do not scale well or make limiting assumptions about the model that can produce misleading results when the gkm kernel is combined with nonlinear kernels. Here, we propose Gk… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
55
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 49 publications
(55 citation statements)
references
References 10 publications
0
55
0
Order By: Relevance
“…Next, we used three complementary approaches, GkmExplain 37 , in silico mutagenesis 38 , and deltaSVM 39 to predict the allelic impact of 1677 candidate SNPs on chromatin accessibility in each cluster by providing the sequences corresponding to both alleles of each SN to the models for each of the 24 clusters. All three approaches showed high concordance of predicted allelic effects across all candidate SNPs (Supplementary Fig.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Next, we used three complementary approaches, GkmExplain 37 , in silico mutagenesis 38 , and deltaSVM 39 to predict the allelic impact of 1677 candidate SNPs on chromatin accessibility in each cluster by providing the sequences corresponding to both alleles of each SN to the models for each of the 24 clusters. All three approaches showed high concordance of predicted allelic effects across all candidate SNPs (Supplementary Fig.…”
Section: Resultsmentioning
confidence: 99%
“…For each SNP in a peak in each of the clusters, we computed GkmExplain 37 importance scores for each position in each of the 1000 bp effect and non-effect allele sequences using each of the 10 gkm-SVM 36 models for the respective cluster. GkmExplain is a method to infer the importance or predictive contribution of every base in an input sequence to its corresponding output prediction from a gkm-SVM model.…”
Section: Methodsmentioning
confidence: 99%
“…While CNNs have the potential to outperform these simpler models, they require careful attention to the selection of adequate architectures and hyperparameter optimization. While not a focus of this work, models may be further interpreted with respect to their sequence features learned [41,42], in order to shed more light upon the sequence encoding of gene regulation.…”
Section: Discussionmentioning
confidence: 99%
“…We compare the predictive and attribution performance of SVM to CNN models. Details of training and attribution with mutagenesis and GkmExplain, an integrated gradient method [31], can be found in the supplementary material.…”
Section: K-mer-based Methodsmentioning
confidence: 99%