Traditionally, DBMSs are shipped with hundreds of configuration parameters. Since the database performance highly depends on the appropriate settings of the configuration parameters, DBAs spend a lot of their time and effort to find the best parameter values for tuning the performance of the application of interest. In many cases, they rely on their experience and some rules of thumbs. However, time and effort may be wasted by tuning those parameters which may have no or marginal effects. Moreover, tuning effects also vary depending on the expertise of the DBAs, but skilled DBAs are increasingly becoming rare and expensive to employ. To address these problems, we present a Statistical Approach for Ranking Database parameters (SARD), which is based on the Plackett & Burman statistical design methodology. SARD takes the query workload and the number of configuration parameters as inputs, and using only a linear number of experiments, generates a ranking of database parameters based on their relative impacts on the DBMS performance. Preliminary experimental results using TPC-H and PostgreSQL show that SARD generated ranking can correctly identify critical configuration parameters.
No abstract
We present FlashStore, a high throughput persistent key-value store, that uses flash memory as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the working set of key-value pairs on flash and use one flash read per key lookup. As the working set changes over time, space is made for the current working set by destaging recently unused key-value pairs to hard disk and recycling pages in the flash store. FlashStore organizes key-value pairs in a log-structure on flash to exploit faster sequential write performance. It uses an in-memory hash table to index them, with hash collisions resolved by a variant of cuckoo hashing. The in-memory hash table stores compact key signatures instead of full keys so as to strike tradeoffs between RAM usage and false flash read operations. FlashStore can be used as a high throughput persistent key-value storage layer for a broad range of server class applications. We compare FlashStore with BerkeleyDB, an embedded key-value store application, running on hard disk and flash separately, so as to bring out the performance gain of FlashStore in not only using flash as a cache above hard disk but also in its use of flash aware algorithms. We use real-world data traces from two data center applications, namely, Xbox LIVE Primetime online multi-player game and inline storage deduplication, to drive and evaluate the design of FlashStore on traditional and low power server platforms. FlashStore outperforms BerkeleyDB by up to 60x on throughput (ops/sec), up to 50x on energy efficiency (ops/Joule), and up to 85x on cost efficiency (ops/sec/dollar) on the evaluated datasets.
Abstract-Solid State Disks (SSDs) using NAND flash memory are increasingly being adopted in the high-end servers of datacenters to improve performance of the I/O-intensive applications. Compared to the traditional enterprise class hard disks, SSDs provide faster read performance, lower cooling cost, and higher power efficiency. However, write performance of a flash based SSD can be up to an order of magnitude slower than its read performance. Furthermore, frequent write operations degrade the lifetime of flash memory. A nonvolatile cache can greatly help to solve these problems. Although a RAM cache is relative high in cost, it has successfully eliminated the performance gap between fast CPU and slow magnetic disk. Similarly, a nonvolatile cache in an SSD can alleviate the disparity between the flash memory's read and write performance. A small write cache that reduces the number of flash block erase operations, can lead to substantial performance gain for write-intensive applications and can extend the overall lifetime of flash based SSDs. This paper presents a novel write caching algorithm, the Large Block CLOCK (LB-CLOCK) algorithm, which considers 'recency' and 'block space utilization' metrics to make cache management decisions. LB-CLOCK dynamically varies the priority between these two metrics to adapt to changes in workload characteristics. Our simulation based experimental results show that LB-CLOCK outperforms the best known existing flash caching algorithms for a wide range of workloads.
We present SkimpyStash, a RAM space skimpy key-value store on flas -based storage, designed for high throughput, low latency server applications. The distinguishing feature of SkimpyStash is the design goal of extremely low RAM footprint at about 1 (± 0.5) byte per key-value pair, which is more aggressive than earlier designs. SkimpyStash uses a hash table directory in RAM to index key-value pairs stored in a log-structured manner on flash To break the barrier of a flas pointer (say, 4 bytes) worth of RAM overhead per key, it "moves" most of the pointers that locate each key-value pair from RAM to flas itself. This is realized by (i) resolving hash table collisions using linear chaining, where multiple keys that resolve (collide) to the same hash table bucket are chained in a linked list, and (ii) storing the linked lists on flas itself with a pointer in each hash table bucket in RAM pointing to the beginning record of the chain on flash hence incurring multiple flas reads per lookup. Two further techniques are used to improve performance: (iii) two-choice based load balancing to reduce wide variation in bucket sizes (hence, chain lengths and associated lookup times), and a bloom filte in each hash table directory slot in RAM to disambiguate the choice during lookup, and (iv) compaction procedure to pack bucket chain records contiguously onto flas pages so as to reduce flas reads during lookup. The average bucket size is the critical design parameter that serves as a powerful knob for making a continuum of tradeoffs between low RAM usage and low lookup latencies. Our evaluations on commodity server platforms with real-world data center applications show that SkimpyStash provides throughputs from few 10,000s to upwards of 100,000 get-set operations/sec.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.