Matteo Pergolesi scite author profile

Columnar file formats provide an efficient way to store data to be queried by SQL-on-Hadoop engines. Related works consider the performance of processing engine and file format together, which makes it impossible to predict their individual impact. In this work, we propose an alternative approach: by executing each file format on the same processing engine, we compare the different file formats as well as their different parameter settings. We apply our strategy to two processing engines, Hive and SparkSQL, and evaluate the performance of two columnar file formats, ORC and Parquet. We use BigBench (TPCx-BB), a standardized application-level benchmark for Big Data scenarios. Our experiments confirm that the file format selection and its configuration significantly affect the overall performance. We show that ORC generally performs better on Hive, whereas Parquet achieves best performance with SparkSQL. Using ZLIB compression brings up to 60.2% improvement with ORC, while Parquet achieves up to 7% improvement with Snappy. Exceptions are the queries involving text processing, which do not benefit from using any compression.

show abstract

Performance Isolation for Network Slices in Industry 4.0: The 5Growth Approach

Chang

Ruiz²,

Paolucci

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Network slicing plays a key role in the 5G ecosystem for verticals to introduce new use cases in the industrial sector, i.e., Industry 4.0. However, a widely recognized challenge of network slicing is to provide traffic isolation and concurrently satisfy diverse performance requirements, e.g., bandwidth and latency. Such challenge becomes even more important when serving a large number of network traffic flows under a resource-limited condition between distributed sites, e.g., factory floor and remote office. In this work, we present the capability to retain these two goals at the same time, by applying the virtual queue notion over a priority queuing based pipeline in P4 switch over software-defined networks. To examine the effectiveness of our approach, a proof-of-concept is setup to serve different requests of Industry 4.0 use cases over a mixed data path, including P4 switch and Open vSwitch, for a large number of network flows.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Matteo Pergolesi

Performance Evaluation of Edge Cloud Computing System for Big Data Applications

Comparison of MongoDB and Cassandra Databases for Spectrum Monitoring As-a-Service

A Performance Comparison of Virtualization Techniques to Deploy a 5G Monitoring Platform

The impact of columnar file formats on SQL‐on‐hadoop engine performance: A study on ORC and Parquet

Performance Isolation for Network Slices in Industry 4.0: The 5Growth Approach

Contact Info

Product

Resources

About