2010 IEEE International Conference on Data Mining Workshops 2010
DOI: 10.1109/icdmw.2010.172
|View full text |Cite
|
Sign up to set email alerts
|

S4: Distributed Stream Computing Platform

Abstract: S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model [1], providing semantics of encapsulation and loca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
406
0
5

Year Published

2011
2011
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 750 publications
(425 citation statements)
references
References 9 publications
0
406
0
5
Order By: Relevance
“…In contrast with batch processing, stream computing views the data as a sequence of elements made available overtime, allowing elements to be processed one by one rather than in large batches [10,11], and this mode may be a good fit for modern generalpurpose processors [12]. Stream computing is a potential way for solving this problem, and four requirements related with designing an algorithm in multi-core architectures should be met [13,14]:…”
Section: Stream Computingmentioning
confidence: 99%
“…In contrast with batch processing, stream computing views the data as a sequence of elements made available overtime, allowing elements to be processed one by one rather than in large batches [10,11], and this mode may be a good fit for modern generalpurpose processors [12]. Stream computing is a potential way for solving this problem, and four requirements related with designing an algorithm in multi-core architectures should be met [13,14]:…”
Section: Stream Computingmentioning
confidence: 99%
“…This work focuses on elastically scaling the performance of individual streaming operators on multicore machines, whereas our work provides a more general architecture for distribution and a platform that can also serve as basis for elastic stream processing. Yahoo's S4 [21] provides an architecture and platform for processing streaming data similar to MapReduce [10] for stored data, and the similar key property of a specific, simple processing model that enables automatic parallelization and deployment on a large number of machines. StreamCloud [15] is a middleware layer that sits on top of streaming engines and focuses on how to parallelize continuous queries by splitting them into subqueries and distributing them to nodes.…”
Section: Related Workmentioning
confidence: 99%
“…In fact, several engines have been extended with middleware platforms: IBM's System S [17] or Yahoo's S4 [21]. These systems are built as extensions to one particular SPE.…”
Section: Introductionmentioning
confidence: 99%
“…Another approach is to use computing frameworks designed for distributed computing [10,11,12,13]. Providing simple programming interfaces for users to develop their applications, these frameworks hide implementation issues such as data distribution, load balancing, data locality and fault tolerance.…”
Section: Introductionmentioning
confidence: 99%