Persistent memory (PM) is fundamentally changing the way database index structures are built by enabling persistence, high performance, and (near) instant recovery all on the memory bus. Prior work has proposed many techniques to tailor index structure designs for PM, but they were mostly based on volatile DRAM with simulation due to the lack of real PM hardware. Until today is it unclear how these techniques will actually perform on real PM hardware. With the recent released Intel Optane DC Persistent Memory, for the first time, this paper provides a comprehensive evaluation of recent persistent index structures. We focus on B + -Tree-based range indexes and carefully choose four representative index structures for evaluation: wBTree, NV-Tree, BzTree and FPTree. These four tree structures cover a wide, representative range of techniques that are essential building blocks of PM-based index structures. For fair comparison, we used an unified programming model for all trees and developed PiBench , a benchmarking framework which targets PM-based indexes. Through empirical evaluation using representative workloads, we identify key, effective techniques, insights and caveats to guide the making of future PM-based index structures.
Finding the best way to leverage non-volatile memory (NVM) on modern database systems is still an open problem. The answer is far from trivial since the clear boundary between memory and storage present in most systems seems to be incompatible with the intrinsic memory-storage duality of NVM. Rather than treating NVM either solely as memory or solely as storage, in this work we propose how NVM can be simultaneously used as both in the context of modern database systems. We design a persistent buffer pool on NVM, enabling pages to be directly read/written by the CPU (like memory) while recovering corrupted pages after a failure (like storage). The main benefits of our approach are an easy integration in the existing database architectures, reduced costs (by replacing DRAM with NVM), and faster peak-performance recovery.
Modern applications employ key-value stores (KVS) in at least some point of their software stack, often as a caching system or a storage manager. Many of these applications also require a high degree of responsiveness and performance predictability. However, most KVS have similar design decisions which focus on improving throughput metrics, at times by sacrificing latency. While latency can be occasionally reduced by over provisioning hardware, this entails significant increase in costs. In this paper we present RStore, a KVS which focus on low tail latency as its primary goal, while also enabling efficient usage of hardware resources. To that aim, we argue in favor of techniques such as an asynchronous programming model, message-passing communication, and log-structured storage on modern hardware. Throughout the paper we discuss these and other design decisions of RStore that differ from those of more traditional systems. Our evaluation shows that RStore scales its throughput with an increasing number of cores while maintaining a robust behavior with low and predictable latency.
The last decade has seen tremendous developments in memory and storage technologies, starting with Flash Memory and continuing with the upcoming Storage-Class Memories. Combined with an explosion of data processing, data analytics, and machine learning, this led to a segmentation of the memory and storage market. Consequently, the traditional storage hierarchy, as we know it today, might be replaced by a multitude of storage hierarchies, with potentially different depths, each tailored for specific workloads. In this context, we explore in this "Kurz Erklärt" the state of memory technologies and reflect on their future use with a focus on data management systems.
No abstract
The emergence of persistent memory (PM), such as Intel Optane DC Persistent Memory Modules (DCPMM), opened up many opportunities for building high-performance indexes directly on PM. However, the many PM indexes proposed by prior work had their evaluation based on PM emulation using DRAM and therefore it was not clear how they would perform on real PM hardware. Moreover, they typically used ad hoc, in-house benchmarks and did not collect PM-specific hardware metrics that are key performance indicators and are instrumental for users and developers to understand the performance behavior of PM indexes. These issues call for a systematic, fair and reproducible approach for evaluating PM indexes. This demonstration highlights the principles and lessons learned from our recent evaluation of PM indexes on real DCPMM and showcases PiBench, a unified benchmarking framework that enables fair and reproducible evaluation of PM indexes. In addition to common metrics, PiBench uniquely integrates monitoring tools to collect PM-specific hardware counters, allowing in-depth performance analysis. Our demonstration is enabled by PiBench Online, a new interactive system built on top of PiBench. Using PiBench Online, users can upload their own index implementations, run preset or customized workloads, and analyze results interactively, all through an easy-to-use web interface. PiBench is open-source and PiBench Online is deployed at https://pibench.org. We hope PiBench Online can promote fair comparison and reproducibility in database and systems communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.