The joint effort of scientific collaborations and the expanding data market creates demand for high-performance and dataintensive analytics infrastructures that can exploit the potential of heterogeneous multi-core architectures with dynamic and scalable execution environments. Contemporary approaches focus on developing efficient parallel application models, but lack the flexibility of efficiently integrating and utilizing native or accelerator-based code. In this work, we illustrate a novel approach on mending this shortcoming and offering seamless application integration into a highly versatile execution infrastructure. The centerpiece is a framework of containerized execution units and management thereof for satisfying the diverse requirements of data analytics pipelines and its stages. Containers not only ease distribution and deployment of applications, but, more importantly enable an efficient synthesis of different stage implementation variants aimed towards exploiting heterogeneous computing resources. Consequently, this approach allows the infrastructure to utilize mainstream data and compute-intensive techniques and paradigms to achieve the goal of efficient pipeline execution. We present our approach in form of a requirement analysis, a multi-tier architecture description, and deployment scenarios based on our current prototype implementation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.