Abstract. BigBench is the new standard (TPCx-BB) for benchmarking and testing Big Data systems. The BigBench specification describes several business use cases which require a broad combination of data extraction techniques including SQL queries, Map/Reduce, user code (UDF), and Machine Learning code. However, currently there is not widespread knowledge of the different resource requirements of each query, as is the case to more established benchmarks. Moreover, the current BigBench implementation allows us to combine different frameworks and libraries from the Hadoop ecosystem. Including combinations such as Hadoop+Hive+Tez (with Mahout) and Spark (SparkSQL+MLlib) in their different versions and configurations. Over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements on performance and the release of v2. It is our intent to compare the current state of Spark v2 to Hive's base implementation. At the same time, cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Hive and Spark come ready to use, with a general-purpose configuration and upgrade management. The study characterizes TPCx-BB queries and the out-of-the-box performance of Spark and Hive versions in the cloud. At the same time, comparing popular PaaS offerings, reliability, scalability, and performance, including Azure HDinsight, Amazon Web Services EMR, and Google Dataproc, with an onpremises commodity cluster as baseline. Results show how there is a need for configuration tuning in most cloud providers as data scales grows, especially with Sparks memory usage. The query characterization shows queries are the most resource consuming according to CPU, Memory (especially for ML), and I/O both disk and network. These results can help practitioners to quickly test systems by picking a subset of the queries which stresses each of the categories. At the same time, results show how Hive and Spark compare in the different query types and what performance can be expected of each in PaaS.