2022
DOI: 10.1186/s13059-022-02723-w
|View full text |Cite
|
Sign up to set email alerts
|

BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin

Abstract: We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct latent factors that encode cell-type specific in vivo binding signals for individual TFs, composite patterns for TFs involved in cooperative binding, and genomic context surrounding the binding sites. On the task of retrieving the motifs of expressed TFs in a g… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 41 publications
0
6
0
Order By: Relevance
“…To the best of our knowledge, just four other tools exist intended to generate de novo motifs from ATAC-seq data, namely BindVAE 15 and MMGraph 16 (both using machine learning in combination with k-mers), CEMIG 17 (which utilizes De Bruijin graphs created on k-mers), and the RSAT peak-motifs pipeline 18 (a pipeline intended for ChIP-seq, which is also applicable to ATAC-seq data). However, BindVAE, CEMIG and RSAT solely operate on the sequences of complete ATAC-seq peaks, therefore these tools lack precision compared to a FP based tool.…”
Section: Resultsmentioning
confidence: 99%
“…To the best of our knowledge, just four other tools exist intended to generate de novo motifs from ATAC-seq data, namely BindVAE 15 and MMGraph 16 (both using machine learning in combination with k-mers), CEMIG 17 (which utilizes De Bruijin graphs created on k-mers), and the RSAT peak-motifs pipeline 18 (a pipeline intended for ChIP-seq, which is also applicable to ATAC-seq data). However, BindVAE, CEMIG and RSAT solely operate on the sequences of complete ATAC-seq peaks, therefore these tools lack precision compared to a FP based tool.…”
Section: Resultsmentioning
confidence: 99%
“…To the best of our knowledge, just four other tools exist intended to generate de novo motifs from ATAC-seq data, namely BindVAE 11 and MMGraph 12 (both using machine learning in combination with k-mers), CEMIG 13 (which utilizes De Bruijin graphs created on k-mers), and the RSAT peak-motifs pipeline 14 (a pipeline intended for ChIP-seq, which is also applicable to ATAC-seq data). However, BindVAE, CEMIG and RSAT solely operate on the sequences of complete ATAC-seq peaks, therefore these tools lack precision compared to a FP based tool.…”
Section: Resultsmentioning
confidence: 99%
“…One shortcoming of scover is that only a minority of the motifs end up being used by the model, and this is computationally costly. By contrast, other methods such as BindVAE [ 23 ] have a more efficient motif usage model, and we believe that exploring better motif representations is an important future direction of research. Our results are also consistent with other studies that have categorized TFs as either preferentially binding promoters or enhancers [ 43 ].…”
Section: Discussionmentioning
confidence: 99%
“…One set of methods, e.g., BPNet [ 20 ], Enformer [ 21 ], and Basset [ 13 , 22 ], require a signal (e.g., chromatin accessibility) to be associated with the sequence, and they can be used to predict the values for sequences that have not been observed during training. In doing so, these methods learn sequence motifs, but with a multi-layered architecture, the representation is distributed and may be difficult to interpret biologically, although methods such as BindVAE [ 23 ] have been successful in using other approaches. This setup, however, is not ideal for predicting gene expression, so different approaches have been proposed [ 6 , 24 , 25 ].…”
Section: Introductionmentioning
confidence: 99%