Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles 2013
DOI: 10.1145/2517349.2522737
|View full text |Cite
|
Sign up to set email alerts
|

Discretized streams

Abstract: Many "big data" applications must act on data in real time. Running these applications at ever-larger scales requires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not handle stragglers. We propose a new processing model, discretized streams (D-Streams), that overcomes these challenges. D-Streams enable a parallel recovery mech… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
52
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 755 publications
(53 citation statements)
references
References 25 publications
(40 reference statements)
0
52
0
1
Order By: Relevance
“…We also summarize some contributions and case studies from the industry. [7,33,61,83,89,90,[93][94][95]. For example, the development of Spark's MLlib began from MLbase 6 project, and then, other projects started to contribute (e.g., KeystoneML 7 ).…”
Section: Overview Of Apache Sparkmentioning
confidence: 99%
See 3 more Smart Citations
“…We also summarize some contributions and case studies from the industry. [7,33,61,83,89,90,[93][94][95]. For example, the development of Spark's MLlib began from MLbase 6 project, and then, other projects started to contribute (e.g., KeystoneML 7 ).…”
Section: Overview Of Apache Sparkmentioning
confidence: 99%
“…Apache Spark system consists of several main components including Spark core [90,93,94] and upper-level libraries: Spark's MLlib for machine learning [61], GraphX [33,83,85] for graph analysis, Spark Streaming [95] for stream processing and Spark SQL [7] for structured data processing. It is evolving rapidly with changes to its core APIs and addition of upper-level libraries.…”
Section: Main Components and Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…Mario also uses HBase for data provenance and single-pass reservoir sampling. The iterative processing in Mario is similar to Spark Streaming [38]. Mario splits the data randomly into many small parts and distributes these on the cluster nodes.…”
Section: Mariomentioning
confidence: 99%