Time Series Motifs Statistical Significance

Castro, Nuno Constantino; Azevedo, Paulo J.

doi:10.1137/1.9781611972818.59

Cited by 13 publications

(9 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, we have assumed the same length for the pair of segments forming a motif pair. This assumption is well motivated, as practically all existing motif discovery algorithms operate under such constraint (e.g., Lin et al, 2002;Chiu et al, 2003;Tanaka et al, 2005;Mueen et al, 2009;Castro & Azevedo, 2011;Mueen, 2013;Yingchareonthawornchai et al, 2013). It is also motivated for the case where we are interested in pairs of segments of different length, as the most common way to compute the dissimilarity between such segments is by re-sampling them to have the same length.…”

Section: Resultsmentioning

confidence: 99%

Ranking and significance of variable-length similarity-based time series motifs

Serrà

Serra

Corral

et al. 2016

Expert Systems with Applications

View full text Add to dashboard Cite

The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank those motifs and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that depend on the motif length in a non-linear way. We believe the incomparability of variable-length dissimilarities could go beyond the field of time series, and that similar modeling strategies as the one used here could be of help in a more broad context.

show abstract

Section: Resultsmentioning

confidence: 99%

Ranking and significance of variable-length similarity-based time series motifs

Serrà

Serra

Corral

et al. 2016

Expert Systems with Applications

View full text Add to dashboard Cite

show abstract

“…So far, this had been done by meticulous visual inspection, which is bounded by the complexity of the data and the inherent biases of our perception. Relying on our time series representation, these explorations could be done using de-novo motif discovery algorithms, in which a sequence dataset is searched for statistically overrepresented segments in a fast, systematic, and unbiased manner [ 53 , 54 ]. Such modular decomposition approaches proved to be transformative in dealing with large volumes of data from sequencing and structural studies of DNA, RNA, and proteins [ 55 – 57 ].…”

Section: Discussionmentioning

confidence: 99%

Template-based mapping of dynamic motifs in tissue morphogenesis

2020

View full text Add to dashboard Cite

Tissue morphogenesis relies on repeated use of dynamic behaviors at the levels of intracellular structures, individual cells, and cell groups. Rapidly accumulating live imaging datasets make it increasingly important to formalize and automate the task of mapping recurrent dynamic behaviors (motifs), as it is done in speech recognition and other data mining applications. Here, we present a "template-based search" approach for accurate mapping of sub-to multi-cellular morphogenetic motifs using a time series data mining framework. We formulated the task of motif mapping as a subsequence matching problem and solved it using dynamic time warping, while relying on high throughput graph-theoretic algorithms for efficient exploration of the search space. This formulation allows our algorithm to accurately identify the complete duration of each instance and automatically label different stages throughout its progress, such as cell cycle phases during cell division. To illustrate our approach, we mapped cell intercalations during germband extension in the early Drosophila embryo. Our framework enabled statistical analysis of intercalary cell behaviors in wild-type and mutant embryos, comparison of temporal dynamics in contracting and growing junctions in different genotypes, and the identification of a novel mode of iterative cell intercalation. Our formulation of tissue morphogenesis using time series opens new avenues for systematic decomposition of tissue morphogenesis.

show abstract

“…Approximate fixed-length motif discovery is largely based upon random projection (CK Algorithm [14]) and Symbolic Aggregate Approximation or SAX [2,15] techniques (discussed further in Section 2.1.1). Of note is the use of iSAX in the MrMotif [16,17] algorithm that derives a set of top-K motifs for a fixed length through increasing SAX resolutions.…”

Section: Literature Reviewmentioning

confidence: 99%

Side-Length-Independent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile

2022

View full text Add to dashboard Cite

As the availability of big data-sets becomes more widespread so the importance of motif (or repeated pattern) identification and analysis increases. To date, the majority of motif identification algorithms that permit flexibility of sub-sequence length do so over a given range, with the restriction that both sides of an identified sub-sequence pair are of equal length. In this article, motivated by a better localised representation of variations in time series, a novel approach to the identification of motifs is discussed, which allows for some flexibility in side-length. The advantages of this flexibility include improved recognition of localised similar behaviour (manifested as motif shape) over varying timescales. As well as facilitating improved interpretation of localised volatility patterns and a visual comparison of relative volatility levels of series at a globalised level. The process described extends and modifies established techniques, namely SAX, MDL and the Matrix Profile, allowing advantageous properties of leading algorithms for data analysis and dimensionality reduction to be incorporated and future-proofed. Although this technique is potentially applicable to any time series analysis, the focus here is financial and energy sector applications where real-world examples examining S&P500 and Open Power System Data are also provided for illustration.

show abstract

Time Series Motifs Statistical Significance

Cited by 13 publications

References 38 publications

Ranking and significance of variable-length similarity-based time series motifs

Ranking and significance of variable-length similarity-based time series motifs

Template-based mapping of dynamic motifs in tissue morphogenesis

Side-Length-Independent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile

Contact Info

Product

Resources

About