Scalable storage support for data stream processing

Sebepou, Zoe; Magoutis, Kostas

doi:10.1109/msst.2010.5496977

Cited by 6 publications

(3 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, Big Data analysis necessitates tremendously time-consuming navigation through a gigantic search space to provide guidelines and obtain feedback from users. Thus, Sebepou and Magoutis [ 87 ] proposed a scalable system of data streaming with a persistent storage path. This path influences the performance properties of a scalable streaming system slightly.…”

Section: Life Cycle and Management Of Data Using Technologies And mentioning

confidence: 99%

Big Data: Survey, Technologies, Opportunities, and Challenges

Khan

Yaqoob

Hashem

et al. 2014

The Scientific World Journal

406

253

View full text Add to dashboard Cite

Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people worldwide are connected to the Internet, and over 5 billion individuals own mobile phones. By 2020, 50 billion devices are expected to be connected to the Internet. At this point, predicted data production will be 44 times greater than that in 2009. As information is transferred and shared at light speed on optic fiber and wireless networks, the volume of data and the speed of market growth increase. However, the fast growth rate of such large data generates numerous challenges, such as the rapid growth of data, transfer speed, diverse data, and security. Nonetheless, Big Data is still in its infancy stage, and the domain has not been reviewed in general. Hence, this study comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data.

show abstract

Section: Life Cycle and Management Of Data Using Technologies And mentioning

confidence: 99%

Big Data: Survey, Technologies, Opportunities, and Challenges

Khan

Yaqoob

Hashem

et al. 2014

The Scientific World Journal

406

253

View full text Add to dashboard Cite

show abstract

“…the information about the point in time from where to replay tuples in case of failure) is not maintained mixing regular output tuples with checkpoint tuples but rather using output tuples header (finer granularity and lower overhead); (2) the earliest timestamp in maintained on-line in StreamCloud, avoiding thus unnecessary read operations in the parallel file system to retrieve its value and finally, (3) [SM11] do not consider dynamic setups (elasticity) nor stateful operators garbage collection mechanisms. The authors of [SM11] leverage previous work [SM10] on how to efficiently persist a stream connecting two data streaming operators relying on a parallel file system. StreamCloud leverages and improves on this work by providing a better way to persist streams adopting a self-identifying naming convention for the persisted information; thus avoiding metadata maintenance as in [SM10] and reducing the runtime protocol impact (about 20ms in the proposed work to approximately 1ms in StreamCloud, as presented in 5.6.2).…”

Section: Fault Tolerance Techniquesmentioning

confidence: 99%

“…The authors of [SM11] leverage previous work [SM10] on how to efficiently persist a stream connecting two data streaming operators relying on a parallel file system. StreamCloud leverages and improves on this work by providing a better way to persist streams adopting a self-identifying naming convention for the persisted information; thus avoiding metadata maintenance as in [SM10] and reducing the runtime protocol impact (about 20ms in the proposed work to approximately 1ms in StreamCloud, as presented in 5.6.2).…”

Section: Fault Tolerance Techniquesmentioning

confidence: 99%

StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine

Gulisano¹

View full text Add to dashboard Cite

I would like to thank my supervisor Ricardo Jiménez Peris, my thesis co-director Patrick Valduriez and Marta Patiño-Martínez for their help. Thanks also to all the lab colleagues (especially Mar, Damián, Paco and Claudio) and the people I had the opportunity to work with (especially Zhang, prof. Marina Papatriantafilou, Zoe and prof. Kostas Magoutis). Special thanks go to Rocío, my friends and my family.

show abstract

CEC: Continuous eventual checkpointing for data stream processing operators

Sebepou

Magoutis

2011

2011 IEEE/IFIP 41st International Conference on Dependable Systems &Amp; Networks (DSN)

View full text Add to dashboard Cite

Scalable storage support for data stream processing

Cited by 6 publications

References 7 publications

Big Data: Survey, Technologies, Opportunities, and Challenges

Big Data: Survey, Technologies, Opportunities, and Challenges

StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine

CEC: Continuous eventual checkpointing for data stream processing operators

Contact Info

Product

Resources

About