2013
DOI: 10.14778/2556549.2556552
|View full text |Cite
|
Sign up to set email alerts
|

Discovering longest-lasting correlation in sequence databases

Abstract: Most existing work on sequence databases use correlation (e.g., Euclidean distance and Pearson correlation) as a core function for various analytical tasks. Typically, it requires users to set a length for the similarity queries. However, there is no steady way to define the proper length on different application needs. In this work we focus on discovering longest-lasting highly correlated subsequences in sequence databases, which is particularly useful in helping those analyses without prior knowledge about t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 26 publications
(32 citation statements)
references
References 31 publications
0
31
0
Order By: Relevance
“…Various indexing techniques for querying the correlations of static time-series data stored in a centralized system have been proposed in [11], [12], [20], [23]. Such techniques are not suitable for our dynamic environment, where the index maintenance cost incurs high processing latency.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Various indexing techniques for querying the correlations of static time-series data stored in a centralized system have been proposed in [11], [12], [20], [23]. Such techniques are not suitable for our dynamic environment, where the index maintenance cost incurs high processing latency.…”
Section: Related Workmentioning
confidence: 99%
“…GC: it is based on distributed group-based join [15], which optimizes the sliding window replication and enables incremental correlation computing [11]. GC computes pair-wise correlations and then performs significance tests.…”
Section: A Baselinesmentioning
confidence: 99%
“…To reasonably compare the similarity of two time series, the sequence values should be Znormalized [15,26]. The Z-normalization is to transform a time series into its normalized from whose mean is approximately zero, and the standard deviation is in a range close to 1.…”
Section: Z-normalizationmentioning
confidence: 99%
“…As an example, the normalized Euclidean distance of two longer subsequences is more likely larger than the normalized Euclidean distance of two shorter subsequences. Thus in longest-lasting correlation query, we use Pearson correlation as the underline similarity measure since it not only reveals the true similarity of time series by Z-normalization but also makes the similarity comparison fairer by length normalization [15]. For clarity, the definition of Pearson correlation is given as follows.…”
Section: Z-normalizationmentioning
confidence: 99%
See 1 more Smart Citation