Yuanzhen Ji scite author profile

One fundamental challenge in data stream processing is to cope with the ubiquity of disorder of tuples within a stream caused by network latency, operator parallelization, merging of asynchronous streams, etc. High result accuracy and low result latency are two conflicting goals in out-of-order stream processing. Different applications may prefer different extent of trade-offs between the two goals. However, existing disorder handling solutions either try to meet one goal to the extreme by sacrificing the other, or try to meet both goals but have shortcomings including unguaranteed result accuracy or increased complexity in operator implementation and application logic.To meet different application requirements on the latency versus result accuracy trade-off in out-of-order stream processing, in this paper, we propose to make this trade-off user-configurable. Particularly, focusing on sliding window aggregates, we introduce AQ-K-slack, a buffer-based qualitydriven disorder handling approach. AQ-K-slack leverages techniques from the fields of sampling-based approximate query processing and control theory. It can adjust the input buffer size dynamically to minimize the result latency, while respecting user-specified threshold on relative errors in produced query results. AQ-K-slack requires no a priori knowledge of disorder characteristics of data streams, and imposes no changes to the query operator implementation or the application logic. Experiments over real-world out-of-order data streams show that, compared to the stateof-art, AQ-K-slack can reduce the average buffer size, thus the average result latency, by at least 51% while respecting user-specified requirement on the accuracy of query results.

show abstract

Quality-Driven Continuous Query Execution over Out-of-Order Data Streams

Ji¹,

Zhou²,

Jerzak³

et al. 2015

View full text Add to dashboard Cite

Executing continuous queries over out-of-order data streams, where tuples are not ordered according to timestamps, is challenging; because high result accuracy and low result latency are two conflicting performance metrics. Although many applications allow trading exact query results for lower latency, they still expect the produced results to meet a certain quality requirement. However, none of existing disorder handling approaches have considered minimizing the result latency while meeting user-specified requirements on the quality of query results.In this demonstration, we showcase AQ-K-slack, an adaptive, buffer-based disorder handling approach, which supports executing sliding window aggregate queries over outof-order data streams in a quality-driven manner. By adapting techniques from the field of sampling-based approximate query processing and control theory, AQ-K-slack dynamically adjusts the input buffer size at query runtime to minimize the result latency, while respecting a user-specified threshold on relative errors in produced query results.We demonstrate a prototype stream processing system, which extends SAP Event Stream Processor with the implementation of AQ-K-slack. Through an interactive interface, the audience will learn the effect of different factors, such as the aggregate function, the window specification, the result error threshold, and stream properties, on the latency and the accuracy of query results. Moreover, they can experience the effectiveness of AQ-K-slack in obtaining user-desired latency vs. result accuracy trade-offs, compared to naive disorder handling approaches that make extreme trade-offs. For instance, by scarifying 1% result accuracy, our system can reduce the result latency by 80% when compared to the state of the art.

show abstract

Quality-driven disorder handling for m-way sliding window stream joins

Sun

Nica

et al. 2016

View full text Add to dashboard Cite

Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable resultlatency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints.In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.1 Note that some join algorithms (e.g., [7], [8]) would miss the result tuple (E 5 , e 7 ) 7 ; because upon receiving a new tuple, they invalidate expired tuples in windows on all streams. Hence, at the arrival of D 8 , the tuple E 5 would expire from the window on S 1 .

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuanzhen Ji

Online parameter optimization for elastic data stream processing

Quality-driven processing of sliding window aggregates over out-of-order data streams

Quality-Driven Continuous Query Execution over Out-of-Order Data Streams

Quality-driven disorder handling for m-way sliding window stream joins

Contact Info

Product

Resources

About