Quickly generating billion-record synthetic databases

Gray, Jim; Sundaresan, Prakash; Englert, Susanne; Bacławski, Kenneth; Weinberger, P.

doi:10.1145/191843.191886

Cited by 138 publications

(126 citation statements)

References 7 publications

Supporting

Mentioning

126

Contrasting

Order By: Relevance

“…To generate the Zipfian distribution, our system utilizes YCSB tool [14], which employs the algorithm for generating a Zipfian-distributed sequence from Gray et al [17]. In our case, we use a clustered distribution: i.e., the popular items are clustered together towards 0 (smaller values are more popular).…”

Section: Data and Workload Generatormentioning

confidence: 99%

Performance Evaluation of Range Queries in Key Value Stores

Pirzadeh

Tatemura²,

Po³

et al. 2012

J Grid Computing

View full text Add to dashboard Cite

Recently there has been a considerable increase in the number of different Key-Value stores, for supporting data storage and applications on the cloud environment. While all these solutions try to offer highly available and scalable services on the cloud, they are significantly different with each other in terms of the architecture and types of the applications, they try to support. Considering three widely-used such systems: Cassandra, HBase and Voldemort; in this paper we compare them in terms of their support for different types of query workloads. We are mainly focused on the range queries. Unlike HBase and Cassandra that have built-in support for range queries, Voldemort does not support this type of queries via its available API. For this matter, practical techniques are presented on top of Voldemort to support range queries. Our performance evaluation is based on mixed query workloads, in the sense that they contain a combination of short and long range queries, beside other types of typical queries on key-value stores such as lookup and update. We show that there are trade-offs in the performance of the selected system and scheme, and the types of the query workloads that can be processed efficiently.

show abstract

Section: Data and Workload Generatormentioning

confidence: 99%

Performance Evaluation of Range Queries in Key Value Stores

Pirzadeh

Tatemura²,

Po³

et al. 2012

J Grid Computing

View full text Add to dashboard Cite

show abstract

“…We first generate a set of data nodes whose access frequencies follow the Zipf(l, h) distribution [31], where l is the mean and h increases with the skewness of the data. The size of a data node in terms of buckets is randomly selected from the range 10-20.…”

Section: Performance Evaluationmentioning

confidence: 99%

Efficient index and data allocation for wireless broadcast services

Chen

2007

Data & Knowledge Engineering

View full text Add to dashboard Cite

“…The access frequencies of the data items are generated based on the Zipf distribution [11]. In the Zipf distribution, the access frequencies of the data items follow the 80/20 rule that 80 percent clients are usually interested in 20 percent data items.…”

Section: Simulation Modelmentioning

confidence: 99%

An Efficient Algorithm for Near Optimal Data Allocation on Multiple Broadcast Channels

Hsu

Lee²,

Chen³

2005

Distrib Parallel Databases

View full text Add to dashboard Cite

In a wireless environment, the bandwidth of the channels and the energy of the portable devices are limited. Data broadcast has become an excellent method for efficient data dissemination. In this paper, the problem for generating a broadcast program of a set of data items with the associated access frequencies on multiple channels is explored. In our approach, a minimal expected average access time of the broadcast data items is first derived. The broadcast program is then generated, which minimizes the minimal expected average access time. Simulation is performed to compare the performance of our approach with two existing approaches.The result of the experiments shows that our approach outperforms others and is in fact close to the optimal.

show abstract

Quickly generating billion-record synthetic databases

Cited by 138 publications

References 7 publications

Performance Evaluation of Range Queries in Key Value Stores

Performance Evaluation of Range Queries in Key Value Stores

Efficient index and data allocation for wireless broadcast services

An Efficient Algorithm for Near Optimal Data Allocation on Multiple Broadcast Channels

Contact Info

Product

Resources

About