Abstract. In many fields of research and business data sizes are breaking the petabyte barrier. This imposes new problems and research possibilities for the database community. Usually, data of this size is stored in large clusters or clouds. Although clouds have become very popular in recent years, there is only little work on benchmarking cloud applications. In this paper we present a data generator for cloud sized applications. Its architecture makes the data generator easy to extend and to configure. A key feature is the high degree of parallelism that allows linear scaling for arbitrary numbers of nodes. We show how distributions, relationships and dependencies in data can be computed in parallel with linear speed up.
Data is one of the most important resources for modern enterprises. Better analytics allow for a better understanding of customer requirements and market dynamics. The more data is collected, the more information can be extracted. However, information value extraction is limited by data processing speeds. Due to fast technological advances in big data management there is an abundance of big data systems. This leaves users in the dilemma of choosing a system that features good end-to-end performance for the use case.To get a good understanding of the actual performance of a system, realistic application level workloads are required.To this end, we have developed BigBench, an application level benchmark focused only on big data analytics. In this paper, we present the vision of BigBench 2.0, a suite of benchmarks for all major aspects of big data processing in common business use cases. Unlike other efforts, BigBench 2.0 will have completely consistent and integrated model and workload, which will allow realistic end-to-end benchmarking of big data systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.