Columnar file formats provide an efficient way to store data to be queried by SQL-on-Hadoop engines. Related works consider the performance of processing engine and file format together, which makes it impossible to predict their individual impact. In this work, we propose an alternative approach: by executing each file format on the same processing engine, we compare the different file formats as well as their different parameter settings. We apply our strategy to two processing engines, Hive and SparkSQL, and evaluate the performance of two columnar file formats, ORC and Parquet. We use BigBench (TPCx-BB), a standardized application-level benchmark for Big Data scenarios. Our experiments confirm that the file format selection and its configuration significantly affect the overall performance. We show that ORC generally performs better on Hive, whereas Parquet achieves best performance with SparkSQL. Using ZLIB compression brings up to 60.2% improvement with ORC, while Parquet achieves up to 7% improvement with Snappy. Exceptions are the queries involving text processing, which do not benefit from using any compression.
Network slicing plays a key role in the 5G ecosystem for verticals to introduce new use cases in the industrial sector, i.e., Industry 4.0. However, a widely recognized challenge of network slicing is to provide traffic isolation and concurrently satisfy diverse performance requirements, e.g., bandwidth and latency. Such challenge becomes even more important when serving a large number of network traffic flows under a resource-limited condition between distributed sites, e.g., factory floor and remote office. In this work, we present the capability to retain these two goals at the same time, by applying the virtual queue notion over a priority queuing based pipeline in P4 switch over software-defined networks. To examine the effectiveness of our approach, a proof-of-concept is setup to serve different requests of Industry 4.0 use cases over a mixed data path, including P4 switch and Open vSwitch, for a large number of network flows.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.