2019
DOI: 10.1007/s00778-019-00573-w
|View full text |Cite
|
Sign up to set email alerts
|

Coconut: sortable summarizations for scalable indexes over static and streaming data series

Abstract: Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well for massive datasets in terms of performance, or storage costs. We pinpoint the problem to the fact that existing summarizations of data series used for indexing cannot be sorted while keeping similar data series close to each other in the sorted order. To address this proble… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
10
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 21 publications
(10 citation statements)
references
References 67 publications
(134 reference statements)
0
10
0
Order By: Relevance
“…Various dimensionality reduction techniques exist for data series, which can then be scanned and filtered [38,49] or in-dexed and pruned [20][21][22][42][43][44]52,61,65,75,76,81,89] during query answering, including deep-learned methods [80]; for a complete discussion of such techniques, we refer the reader to two recent tutorials on the subject [25,26]. We follow the same approach of indexing the series based on their summaries, though our work is the first to exploit the parallelization opportunities offered by modern hardware, in order to accelerate in-memory index construction and similarity search for data series.…”
Section: Related Workmentioning
confidence: 99%
“…Various dimensionality reduction techniques exist for data series, which can then be scanned and filtered [38,49] or in-dexed and pruned [20][21][22][42][43][44]52,61,65,75,76,81,89] during query answering, including deep-learned methods [80]; for a complete discussion of such techniques, we refer the reader to two recent tutorials on the subject [25,26]. We follow the same approach of indexing the series based on their summaries, though our work is the first to exploit the parallelization opportunities offered by modern hardware, in order to accelerate in-memory index construction and similarity search for data series.…”
Section: Related Workmentioning
confidence: 99%
“…In the recent years, there has been much research on similarity searches and the subsequent data indexing [4]- [6]. In the context of time-series data indexing, an example query related to a similarity search can include finding past days in which the temperature recording is similar to today's pattern.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, we observe that clients not only focus on finding a trend (up or down) or a similar pattern in time-series data in a period of time, they also expect to obtain summarized information on such time series. The term 'summarized information' that we refer in this paper is not likely ''summarizations'' that proposed in [6], which are representations of time-series data segments. Our term means summarized outcomes extracted from a segment of data by relevant user-defined functions.…”
Section: Introductionmentioning
confidence: 99%
“…However, similarity search in very large data series collections is notoriously challenging [70,49,50,50,18,17,13,14,2], due to the high dimensionality (length) of the data series. In order to address this problem, a significant amount of effort has been dedicated by the data management research community to data series indexing techniques [51,13,14], which lead to fast and scalable similarity search [16,56,29,4,62,24,66,11,12,71,72,68,69,53,55,54,9,31,32,33]. Predefined constraints.…”
mentioning
confidence: 99%
“…We note that the technique discussed above (despite its limitations) is indeed the current state of the art, and no other technique has been proposed since, even though during the same period of time we have witnessed lots of activity and a steady stream of papers on the single-length similarity search problem (e.g., [29,4,62,10,66,11,71,72,68,69,53,55,54,31,32,33]). This attests to the challenging nature of the problem we are tackling in this paper.…”
mentioning
confidence: 99%