Summary
Processing data in a timely manner, data stream processing (DSP) applications are receiving an increasing interest for building new pervasive services. Due to the unpredictability of data sources, these applications often operate in dynamic environments; therefore, they require the ability to elastically scale in response to workload variations. In this paper, we deal with a key problem for the effective runtime management of a DSP application in geo‐distributed environments: We investigate the placement and replication decisions while considering the application and resource heterogeneity and the migration overhead, so to select the optimal adaptation strategy that can minimize migration costs while satisfying the application quality of service (QoS) requirements. We present elastic DSP replication and placement (EDRP), a unified framework for the QoS‐aware initial deployment and runtime elasticity management of DSP applications. In EDRP, the deployment and runtime decisions are driven by the solution of a suitable integer linear programming problem, whose objective function captures the relative importance between QoS goals and reconfiguration costs. We also present the implementation of EDRP and the related mechanisms on Apache Storm. We conduct a thorough experimental evaluation, both numerical and prototype‐based, that shows the benefits achieved by EDRP on the application performance.
Data Stream Processing (DSP) has emerged over the years as the reference paradigm for the analysis of continuous and fast information flows, which have often to be processed with low-latency requirements to extract insights and knowledge from raw data. Dealing with unbounded data flows, DSP applications are typically long-running and, thus, likely experience varying workloads and working conditions over time. To keep a consistent service level in face of such variability, a lot of effort has been spent studying strategies for run-time adaptation of DSP systems and applications. In this survey, we review the most relevant approaches from the literature, presenting a taxonomy to characterize the state of the art along several key dimensions. Our analysis allows us to identify current research trends as well as open challenges that will motivate further investigations in this field.
The capability of efficiently processing the data streams emitted by nowadays ubiquitous sensing devices enables the development of new intelligent services. Data Stream Processing (DSP) applications allow for processing huge volumes of data in near real-time. To keep up with the high volume and velocity of data, these applications can elastically scale their execution on multiple computing resources to process the incoming data flow in parallel. Being that data sources and consumers are usually located at the network edges, nowadays the presence of geo-distributed computing resources represents an attractive environment for DSP. However, controlling the applications and the processing infrastructure in such wide-area environments represents a significant challenge. In this paper, we present a hierarchical solution for the autonomous control of elastic DSP applications and infrastructures. It consists of a two-layered hierarchical solution, where centralized components coordinate subordinated distributed managers, which, in turn, locally control the elastic adaptation of the application components and deployment regions. Exploiting this framework, we design several self-adaptation policies, including reinforcement learning based solutions. We show the benefits of the presented self-adaptation policies with respect to static provisioning solutions, and discuss the strengths of reinforcement learning based approaches, which learn from experience how to optimize the application performance and resource allocation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.