Kun-Lung Wu scite author profile

This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution.

show abstract

SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems

Wolf¹,

Bansal²,

Hildrum³

et al. 2008

110

View full text Add to dashboard Cite

Abstract. This paper describes the SODA scheduler for System S , a highly scalable distributed stream processing system. Unlike traditional batch applications, streaming applications are open-ended. The system cannot typically delay the processing of the data. The scheduler must be able to shift resource allocation dynamically in response to changes to resource availability, job arrivals and departures, incoming data rates and so on. The design assumptions of System S , in particular, pose additional scheduling challenges. SODA must deal with a highly complex optimization problem, which must be solved in real-time while maintaining scalability. SODA relies on a careful problem decomposition, and intelligent use of both heuristic and exact algorithms. We describe the design and functionality of SODA, outline the mathematical components, and describe experiments to show the performance of the scheduler.

show abstract

Elastic scaling of data parallel operators in stream processing

et al. 2009

View full text Add to dashboard Cite

We describe an approach to elastically scale the performance of a data analytics operator that is part of a streaming application. Our techniques focus on dynamically adjusting the amount of computation an operator can carry out in response to changes in incoming workload and the availability of processing cycles. We show that our elastic approach is beneficial in light of the dynamic aspects of streaming workloads and stream processing environments. Addressing another recent trend, we show the importance of our approach as a means to providing computational elasticity in multicore processor-based environments such that operators can automatically find their best operating point. Finally, we present experiments driven by synthetic workloads, showing the space where the optimizing efforts are most beneficial and a radioastronomy imaging application, where we observe substantial improvements in its performance-critical section.

show abstract

Lipophagy mediated carbohydrate-induced changes of lipid metabolism via oxidative stress, endoplasmic reticulum (ER) stress and ChREBP/PPARγ pathways

Zhao

Högstrand

et al. 2019

Cell. Mol. Life Sci.

112

View full text Add to dashboard Cite

show abstract

Optimizing index allocation for sequential data broadcasting in wireless mobile computing

Chen

2003

IEEE Trans. Knowl. Data Eng.

109

View full text Add to dashboard Cite

Energy saving is one of the most important issues in wireless mobile computing. Among others, one viable approach to achieving energy saving is to use an indexed data organization to broadcast data over wireless channels to mobile units. Using indexed broadcasting, mobile units can be guided to the data of interest efficiently and only need to be actively listening to the broadcasting channel when the relevant information is present. In this paper, we explore the issue of indexing data with skewed access for sequential broadcasting in wireless mobile computing. We first propose methods to build index trees based on access frequencies of data records. To minimize the average cost of index probes, we consider two cases: one for fixed index fanouts and the other for variant index fanouts, and devise algorithms to construct index trees for both cases. We show that the cost of index probes can be minimized not only by employing an imbalanced index tree that is designed in accordance with data access skew, but also by exploiting variant fanouts for index nodes. Note that, even for the same index tree, different broadcasting orders of data records will lead to different average data access times. To address this issue, we develop an algorithm to determine the optimal order for sequential data broadcasting to minimize the average data access time. Performance evaluation on the algorithms proposed is conducted. Examples and remarks are given to illustrate our results.

show abstract

FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads

Wolf¹,

Rajan²,

Hildrum³

et al. 2010

View full text Add to dashboard Cite

The CHAMPS system: change management with planning and scheduling

et al.

View full text Add to dashboard Cite

IBM Streams Processing Language: Analyzing Big Data in motion

Hirzel¹,

Andrade²,

Gedik³

et al. 2013

IBM J. Res. & Dev.

104

View full text Add to dashboard Cite

The IBM Streams Processing Language (SPL) is the programming language for IBM InfoSphere A Streams, a platform for analyzing Big Data in motion. By BBig Data in motion,[ we mean continuous data streams at high data-transfer rates. InfoSphere Streams processes such data with both high throughput and short response times. To meet these performance demands, it deploys each application on a cluster of commodity servers. SPL abstracts away the complexity of the distributed system, instead exposing a simple graph-of-operators view to the user. SPL has several innovations relative to prior streaming languages. For performance and code reuse, SPL provides a code-generation interface to C++ and Java A. To facilitate writing well-structured and concise applications, SPL provides higher-order composite operators that modularize stream sub-graphs. Finally, to enable static checking while exposing optimization opportunities, SPL provides a strong type system and user-defined operator models. This paper provides a language overview, describes the implementation including optimizations such as fusion, and explains the rationale behind the language design.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kun-Lung Wu

Elastic Scaling for Data Stream Processing

SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems

Elastic scaling of data parallel operators in stream processing

Lipophagy mediated carbohydrate-induced changes of lipid metabolism via oxidative stress, endoplasmic reticulum (ER) stress and ChREBP/PPARγ pathways

Optimizing index allocation for sequential data broadcasting in wireless mobile computing

FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads

The CHAMPS system: change management with planning and scheduling

IBM Streams Processing Language: Analyzing Big Data in motion

Contact Info

Product

Resources

About