2018
DOI: 10.1101/gr.226852.117
|View full text |Cite
|
Sign up to set email alerts
|

A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction

Abstract: The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the -mer set memory (KSM), which consists of a set of aligned-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
31
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(31 citation statements)
references
References 63 publications
(93 reference statements)
0
31
0
Order By: Relevance
“…The MEME format is supported by the majority of the motif databases ( Kulakovskiy et al 2018 ), and the MEME suite provides packages for integrative analysis and conversion from other motifs formats ( Bailey et al 2009 ). The recently proposed kmer-based motif models also support conversion to MEME format ( Fletez-Brant et al 2013 ; Ghandi et al 2014 ; Zeng et al 2016 ; Guo et al 2018 ). Our package is lightweight and open-source.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The MEME format is supported by the majority of the motif databases ( Kulakovskiy et al 2018 ), and the MEME suite provides packages for integrative analysis and conversion from other motifs formats ( Bailey et al 2009 ). The recently proposed kmer-based motif models also support conversion to MEME format ( Fletez-Brant et al 2013 ; Ghandi et al 2014 ; Zeng et al 2016 ; Guo et al 2018 ). Our package is lightweight and open-source.…”
Section: Resultsmentioning
confidence: 99%
“…However, when in a textual interface, representing PWMs requires an by matrix, where is the number of characters (such as A, C, G, T, for nucleotides), and is the length of the motif. Recently, several studies have shown the usefulness of representing motifs using kmers ( Fletez-Brant et al 2013 ; Ghandi et al 2014 ; Zeng et al 2016 ; Guo et al 2018 ); despite the power of this representation in machine learning models, it is cumbersome to have a set of kmers to characterize a single motif. In many scenarios, motifs can be sufficiently represented by regular expressions of the consensus sequences, such as [GC][AT]GATAAG[GAC] for the GATA2 motif.…”
mentioning
confidence: 99%
“…MEME (51) and DREME (52). KMAC innovatively used k -mer set memory for motif representation in order to capture the contribution of nucleotides dependency and flanking k -mers in TF–DNA binding (32). Compared with other state-of-the-art motif finding methods (e.g.…”
Section: Methodsmentioning
confidence: 99%
“…Compared with other state-of-the-art motif finding methods (e.g. HOMER (35) and ChIPMunk (53)), KMAC achieved the best performance in discovering known motifs from ChIP-seq datasets (32). Gkm-SVM was selected in the comparison as it significantly outperforms traditional kmer-SVM methods by using gapped k -mers for accurately and efficiently identifying longer motifs, which are hard to model as k -mers.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation