2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing 2014
DOI: 10.1109/ccgrid.2014.97
|View full text |Cite
|
Sign up to set email alerts
|

V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows

Abstract: Abstract-In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in excess o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 9 publications
0
12
0
Order By: Relevance
“…Each of these studies required the development of new systems for measurement and analysis, including Multi-Probe [62] and BTWorld [63], which are both global-scale monitors for BT-ecosystems, the former also focusing on collecting Internet-tracing data (not possible anymore under GDPR laws), the latter focusing on efficient collection of aggregate-data. In 2014, while trying to analyze the full BTWorld-dataset, we developed a novel big data analytics pipeline [67]; the process allowed us to discover the phenomenon of vicissitude [38] (see Section 2.5).…”
Section: The Design Of P2p Systemsmentioning
confidence: 99%
See 2 more Smart Citations
“…Each of these studies required the development of new systems for measurement and analysis, including Multi-Probe [62] and BTWorld [63], which are both global-scale monitors for BT-ecosystems, the former also focusing on collecting Internet-tracing data (not possible anymore under GDPR laws), the latter focusing on efficient collection of aggregate-data. In 2014, while trying to analyze the full BTWorld-dataset, we developed a novel big data analytics pipeline [67]; the process allowed us to discover the phenomenon of vicissitude [38] (see Section 2.5).…”
Section: The Design Of P2p Systemsmentioning
confidence: 99%
“…Ecosystems are super-distributed [1]: they are recursively distributed, with their constituents often being distributed (eco)systems; yet, FRs and NFRs in distributed systems are not known to be directly composable across ecosystems. Various dynamic phenomena appear in distributed ecosystems, seemingly unique situations that do not fit the patterns expected from current theory and practice; for example, vicissitude [38] is a class of phenomena where several known bottlenecks appear seemingly at random in various parts of the system, performance variability is common in clouds [39], datacenter networks [40], and big data operations [41], and ecosystem owners spar with each other (e.g., in Jan 2019, Apple denied Facebook and Google access to its APIs, Unity changed their Terms-of-Service and thus locked out small developers like SpatialOS).…”
Section: New Challenges In Mcs Designmentioning
confidence: 99%
See 1 more Smart Citation
“…Chains and workflows of MapReduce jobs are useful [52], but could be difficult to manage and troubleshoot. Vicissitudeworkflows of MapReduce jobs lead to diverse challenges, by stressing different system resources at different or even the same time [39]. Workloads can be dominated by a few MapReduce jobs, used periodically or in bursts [13].…”
Section: Mapreduce-based Data-intensive Batch Processingmentioning
confidence: 99%
“…This property signies a combination between the large volume of data and the complexity of processing workow, which prevent to gather useful insights in data [18].…”
Section: Big Data Applicationsmentioning
confidence: 99%