2011
DOI: 10.1093/nar/gkr574
|View full text |Cite
|
Sign up to set email alerts
|

STEME: efficient EM to find motifs in large data sets

Abstract: MEME and many other popular motif finders use the expectation–maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughput biological techniques. A suffix tree is a data structure that can efficiently index a set of sequences. We describe an algorithm, Suffix Tree EM for Motif Elicitation (STEME), that approximates EM using suffix tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
39
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 45 publications
(39 citation statements)
references
References 39 publications
0
39
0
Order By: Relevance
“…STEME's EM algorithm implementation has been described in some detail in the original STEME paper [14], so we do not repeat it here.…”
Section: Methodsmentioning
confidence: 99%
“…STEME's EM algorithm implementation has been described in some detail in the original STEME paper [14], so we do not repeat it here.…”
Section: Methodsmentioning
confidence: 99%
“…Both the suffix tree model [43,44] and projec-tion [45,46] model are used as data structures to look for high quality motifs. Meanwhile, the GA (Genetic Algorithm) [47,48] is applied in the PWM model to help it improve performance and Reid et al [49] have combined the GA algorithm with the suffix tree model to make optimizations and advancements. Based on gradually developed Phylogenetic footprinting [50][51][52][53][54] and ChIP-seq techniques [55][56][57] lots of related methods for discovering TFBSs have been put forward.…”
Section: Various Algorithms and Techniques Applying For Tfbss Predictionmentioning
confidence: 99%
“…Given a set of DNA sequences, these programs search for characteristic motifs using two major approaches: profile-based and consensus-based methods. In particular, the recent increase in data size due to the ChIP-Seq technique led to the development of methods that can accept greater than thousands of DNA sequences (Sharov and Ko, 2009;Li, 2008;Heinz et al, 2010;Kulakovskiy et al, 2010;Bailey, 2011;Machanick and Bailey, 2011;Reid and Wernisch, 2011;Ma et al, 2012;Hartmann et al, 2013). Most of these software programs focused on reductions in computational time, for example, by subsampling input data (e.g., MEME-ChIP (Machanick and Bailey, 2011)), accelerating expectation-maximization steps in profile optimization (e.g., ChIPMunk (Kulakovskiy et al, 2010) and STEME (Reid and Wernisch, 2011), which use a greedy approach and suffix array, respectively), or using enriched sequences as starting points for the motif search (e.g., DREME (Bailey, 2011), cERMIT (Georgiev et al, 2010), and HOMER (Heinz et al, 2010)).…”
Section: Introductionmentioning
confidence: 99%
“…Whereas existing methods can partly represent motif ambiguity, these methods are unable to directly answer whether a given TF binds to a specific sequence pattern. For example, the most popular software program MEME (Bailey and Elkan, 1994) and other recently developed software programs (Reid and Wernisch, 2011;Zhang et al, 2013), which adopt the expectation-maximization algorithm, iteratively enrich DNA sequences that contain possible DNA-binding motifs and often converges to a local optimum. Given the nature of this algorithm, discovered DNA-binding motifs can miss non-canonical but significant motifs that were removed from the enriched dataset during the computation.…”
Section: Introductionmentioning
confidence: 99%