Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing

Zaharia, Matei; Das, Tamal; Li, Haoyuan; Hunter, Timothy; Shenker, Scott; Stoica, Ion

doi:10.21236/ada575859

Cited by 306 publications

(389 citation statements)

References 25 publications

Supporting

Mentioning

317

Contrasting

Unclassified

Order By: Relevance

“…Although, streaming regression algorithms (e.g. Spark Streaming [19]) based on micro batch analysis [20] can provide faster solution but these algorithms do it at the expense of less accuracy. In our earlier work [21], we proposed a solution highlighting these drawbacks and presented initial results.…”

Section: B Motivationmentioning

confidence: 99%

Predictive Analytics for Complex IoT Data Streams

Akbar

Khan

Carrez

et al. 2017

IEEE Internet Things J.

115

View full text Add to dashboard Cite

Abstract-The requirements of analyzing heterogeneous data streams and detecting complex patterns in near real-time have raised the prospect of Complex Event Processing (CEP) for many internet of things (IoT) applications. Although CEP provides a scalable and distributed solution for analyzing complex data streams on the fly, it is designed for reactive applications as CEP acts on near real-time data and does not exploit historical data. In this regard, we propose a proactive architecture which exploits historical data using machine learning (ML) for prediction in conjunction with CEP. We propose an adaptive prediction algorithm called Adaptive Moving Window Regression (AMWR) for dynamic IoT data and evaluated it using a real-world use case with an accuracy of over 96%. It can perform accurate predictions in near real-time due to reduced complexity and can work along CEP in our architecture. We implemented our proposed architecture using open source components which are optimized for big data applications and validated it on a use-case from Intelligent Transportation Systems (ITS). Our proposed architecture is reliable and can be used across different fields in order to predict complex events.

show abstract

Section: B Motivationmentioning

confidence: 99%

Predictive Analytics for Complex IoT Data Streams

Akbar

Khan

Carrez

et al. 2017

IEEE Internet Things J.

115

View full text Add to dashboard Cite

show abstract

“…Spark introduces in memory partitions and computing, thereby reducing frequent hard disk reads and writes, which improves the response time which is the key characteristic of stream computing. Discretized Streams [11]are tuples of Resilient Distributed Data Sets(RDDs) [10], whichprocess streams as short, deterministic tasks which are also stateless. RDDs reconstruct themselves through lineage information, thereby achieving fault tolerance [12].…”

Section: A Streaming Data Analyticsmentioning

confidence: 99%

“…Discretized Streams [11]are tuples of Resilient Distributed Data Sets(RDDs) [10], whichprocess streams as short, deterministic tasks which are also stateless. RDDs reconstruct themselves through lineage information, thereby achieving fault tolerance [12].…”

Section: A Streaming Data Analyticsmentioning

confidence: 99%

A Survey on Realtime Analytics Framework for Smart Grid Energy Management

K.Sornalakshmi¹

2015

IJIRSET

View full text Add to dashboard Cite

Smart grids are modernized electricity grids with information technology support. Smart Grids are the most promising development in the energy and utilities market.Smart grids are being installed in many countries and it is expected to have multi-fold benefits in efficient energy management. The Smart Grids receive real time meter data with high velocity and volume. In such scenario, near real time efficient analytics of streaming smart meter data and quick decision making is significant. In this paper, we survey the existing methodologies and means for real time energy data management in smart grids.

show abstract

“…Obviously, more efficient algorithms are required, and thus, the RkNN problem has been studied extensively in the past years for centralized environments [16]. But, with the fast increase in the scale of the big input datasets, parallel and distributed algorithms for RkNNQ in MapReduce [2] have been designed and implemented [6,7], and there are no RkNNQ implementations in Spark [17].…”

Section: Introductionmentioning

confidence: 99%

RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

García-García

Corral

Iribarne

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN queries are of particular interest in a wide range of applications such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. With the current increasing volume of spatial data, it is difficult to perform RkNN queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources. In this paper, we investigate how to design and implement distributed RkNN query algorithms using shared-nothing spatial cloud infrastructures as SpatialHadoop and LocationSpark. SpatialHadoop is a framework that inherently supports spatial indexing on top of Hadoop to perform efficiently spatial queries. LocationSpark is a recent spatial data processing system built on top of Spark. We have evaluated the performance of the distributed RkNN query algorithms on both SpatialHadoop and LocationSpark with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in both distributed spatial data management systems, showing the performance advantages of LocationSpark.

show abstract

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing

Cited by 306 publications

References 25 publications

Predictive Analytics for Complex IoT Data Streams

Predictive Analytics for Complex IoT Data Streams

A Survey on Realtime Analytics Framework for Smart Grid Energy Management

RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

Contact Info

Product

Resources

About