2021
DOI: 10.1093/bib/bbab303
|View full text |Cite
|
Sign up to set email alerts
|

A comparative benchmark of classic DNA motif discovery tools on synthetic data

Abstract: Hundreds of human proteins were found to establish transient interactions with rather degenerated consensus DNA sequences or motifs. Identifying these motifs and the genomic sites where interactions occur represent one of the most challenging research goals in modern molecular biology and bioinformatics. The last twenty years witnessed an explosion of computational tools designed to perform this task, whose performance has been last compared fifteen years ago. Here, we survey sixteen of them, benchmark their a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 52 publications
0
2
0
Order By: Relevance
“…So far, many studies have promoted the concept of the synthetic sequences as efficient for evaluation of the performance of motif finding in ChIP-seq data. This concept implied the generation of synthetic sequences by Markov chains of various orders (33-36 Tompa et al, 2005;Boeva et al, 2016;Jayaram et al, 2016;Castellana et al, 2021), or these sequences were taken as a complete dictionary of k-mers, i.e. equal frequencies of nucleotides were presumed (6 Kulakovskiy and Makeev 2013).…”
Section: Introductionmentioning
confidence: 99%
“…So far, many studies have promoted the concept of the synthetic sequences as efficient for evaluation of the performance of motif finding in ChIP-seq data. This concept implied the generation of synthetic sequences by Markov chains of various orders (33-36 Tompa et al, 2005;Boeva et al, 2016;Jayaram et al, 2016;Castellana et al, 2021), or these sequences were taken as a complete dictionary of k-mers, i.e. equal frequencies of nucleotides were presumed (6 Kulakovskiy and Makeev 2013).…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by the work of Ofer et al [ 24 ], which considered biological sequences, such as DNA sequences, as human language and used advanced NLP tools to tackle biological tasks, we aimed to model Chinese EHRs as DNA-like sequences and mine linguistic patterns with advanced bioinformatics tools. In a recent review, Castellana et al [ 25 ] surveyed 16 classic DNA motif discovery tools and evaluated their ability to discover sequence motifs nested in 29 simulated sequence data sets. The MEME (Multiple Expectation Maximums for Motif Elicitation) motif discovery tool performed best among the 16 classic DNA motif discovery tools.…”
Section: Introductionmentioning
confidence: 99%