High performance temporal indexing on modern hardware

Lomet, David; Nawab, Faisal

doi:10.1109/icde.2015.7113368

Cited by 8 publications

(5 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A recent study proposed a high performance temporal index similar to time-split B-tree (TSB-tree), called TSBwtree, which focuses on transaction time databases [55]. Binna et al [11], present the Height Optimized Trie (HOT), a generalpurpose index structure for main-memory database systems, while Leis et al [46] describe an in-memory adaptive Radix indexing technique that is designed for modern hardware.…”

Section: Related Workmentioning

confidence: 99%

“…Given the evolution of CPU performance, where the processor clock speed is not increasing due to the power wall constraint, algorithmic speedups can now mainly come by exploiting parallelism [7,12,31,34,55,60,69,71,78,83,87,88]. This involves (i) parallelism across compute nodes (e.g., using Spark) [48,85], where the main goal is to scale to datasets that cannot be easily handled by a single node, and (ii) parallelism inside a single compute node (e.g., ex- 1 A data series, or data sequence, is an ordered sequence of data points.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fast Data Series Indexing for In-Memory Data

Peng¹,

Fatourou²,

Palpanas³

2021

Preprint

View full text Add to dashboard Cite

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and Dynamic Time Warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction, and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in ∼50msec (30-75msec across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Fast Data Series Indexing for In-Memory Data

Peng¹,

Fatourou²,

Palpanas³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Thread-Level Parallelism (TLP) methods, like multiple independent cores and hyper-threads are commonly used to increase algorithm efficiency [17]. A recent study proposed a high performance temporal index similar to timesplit B-tree (TSB-tree), called TSBw-tree, which focuses on transaction time databases [29]. However, this is designed for temporal data, which are 2-dimensional, while in our case, data series can have thousands of dimensions (i.e., the length of the sequence).…”

Section: Related Workmentioning

confidence: 99%

ParIS+: Data Series Indexing on Multi-core Architectures

Peng

Fatourou

Palpanas

2020

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Data series similarity search is a core operation for several data series analysis applications across many different domains. Nevertheless, even state-of-the-art techniques cannot provide the time performance required for large data series collections. We propose ParIS and ParIS+, the first disk-based data series indices carefully designed to inherently take advantage of multi-core architectures, in order to accelerate similarity search processing times. Our experiments demonstrate that ParIS+ completely removes the CPU latency during index construction for disk-resident data, and for exact query answering is up to 1 order of magnitude faster than the current state of the art index scan method, and up to 3 orders of magnitude faster than the optimized serial scan method. ParIS+ (which is an evolution of the ADS+ index) owes its efficiency to the effective use of multi-core and multi-socket architectures, in order to distribute and execute in parallel both index construction and query answering, and to the exploitation of the Single Instruction Multiple Data (SIMD) capabilities of modern CPUs, in order to further parallelize the execution of instructions inside each core. 1 introduction [Motivation] An increasing number of applications across many diverse domains continuously produce very large amounts

show abstract

“…A recent study proposed a high performance temporal index similar to time-split B-tree (TSB-tree), called TSBw-tree, which focuses on transaction time databases [40]. Binna et al [41], present the Height Optimized Trie (HOT), a generalpurpose index structure for main-memory database systems, while Leis et al [42] describe an in-memory adaptive Radix indexing technique that is designed for modern hardware.…”

Section: Related Workmentioning

confidence: 99%

MESSI: In-Memory Data Series Indexing

Peng

Fatourou

Palpanas

2020

2020 IEEE 36th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-core and multi-socket architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction, and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in ∼50msec (30-75msec across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.

show abstract

High performance temporal indexing on modern hardware

Cited by 8 publications

References 24 publications

Fast Data Series Indexing for In-Memory Data

Fast Data Series Indexing for In-Memory Data

ParIS+: Data Series Indexing on Multi-core Architectures

MESSI: In-Memory Data Series Indexing

Contact Info

Product

Resources

About