In the first part of this chapter we illustrate how a big data project can be set up and optimized. We explain the general value of big data analytics for the enterprise and how value can be derived by analyzing big data. We go on to introduce the characteristics of big data projects and how such projects can be set up, optimized and managed. Two exemplary real word use cases of big data projects are described at the end of the first part. To be able to choose the optimal big data tools for given requirements, the relevant technologies for handling big data are outlined in the second part of this chapter. This part includes technologies such as NoSQL and NewSQL systems, in-memory databases, analytical platforms and Hadoop based solutions. Finally, the chapter is concluded with an overview over big data benchmarks that allow for performance optimization and evaluation of big data technologies. Especially with the new big data applications, there are requirements that make the platforms more complex and more heterogeneous. The relevant benchmarks designed for big data technologies are categorized in the last part
In this paper, we evaluate the performance of DataStax Enterprise (DSE) using the HiBench benchmark suite and compare it with the corresponding Cloudera's Distribution of Hadoop (CDH) results. Both systems, DSE and CDH were stress tested using CPU-bound (WordCount), I/O-bound (Enhanced DFSIO) and mixed (HiveBench) workloads. The experimental results showed that DSE is better than CDH in writing files, whereas CDH is better than DSE in reading files. Additionally, for DSE the read and write throughput difference is very minor, whereas for CDH the read throughput is much higher than the write throughput. The results we obtained show that the HiBench benchmark suite, developed specifically for Hadoop, can be successfully executed on top of the DataStax Enterprise (DSE).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.