2013 IEEE International Conference on Big Data 2013
DOI: 10.1109/bigdata.2013.6691631
|View full text |Cite
|
Sign up to set email alerts
|

The BTWorld use case for big data analytics: Description, MapReduce logical workflow, and empirical evaluation

Abstract: Abstract-The commoditization of big data analytics, that is, the deployment, tuning, and future development of big data processing platforms such as MapReduce, relies on a thorough understanding of relevant use cases and workloads. In this work we propose BTWorld, a use case for time-based big data analytics that is representative for processing data collected periodically from a global-scale distributed system. BTWorld enables a datadriven approach to understanding the evolution of BitTorrent, a global file-s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
16
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 21 publications
1
16
0
Order By: Relevance
“…BTWORLD is a complex and very challenging MapReduce-based logical workflow which we designed to observe the evolution of the global-scale peerto-peer system BitTorrent [18]. The workflow consists of 26 MapReduce jobs with different resource bottlenecks (CPU, memory, or disk) and is used for processing monitoring data collected periodically from the BitTorrent system.…”
Section: The Dynamictags Policymentioning
confidence: 99%
See 1 more Smart Citation
“…BTWORLD is a complex and very challenging MapReduce-based logical workflow which we designed to observe the evolution of the global-scale peerto-peer system BitTorrent [18]. The workflow consists of 26 MapReduce jobs with different resource bottlenecks (CPU, memory, or disk) and is used for processing monitoring data collected periodically from the BitTorrent system.…”
Section: The Dynamictags Policymentioning
confidence: 99%
“…In our experience with processing monitoring data from the BitTorrent global network using a MapReduce-based logical workflow [18], we have found that 15% of the jobs account for 80% of the total load, and that 65% of the jobs complete in a minute. Similarly, several studies on the performance of modern production clusters in 1 Inspired by the dinosaur Tyrannosaurus rex, known for its long, heavy tail.…”
mentioning
confidence: 99%
“…At the same time, we surpass our own previous work by a factor of 15 [2]. Most importantly though, the problem we are addressing in this paper concerns the global BitTorrent network, which is of an unprecedented scale in peer-to-peer research.…”
Section: Introductionmentioning
confidence: 70%
“…Starting with 2009, in our BTWorld project [1] we have conducted a longitudinal experiment in observing the global BitTorrent network during which we have collected over 15 TB of operational data. Although we have created a MapReduce-based logical workflow to extract insightful knowledge about the evolution of the BitTorrent network [2], the vicissitude of processing our BitTorrent data, that is the combination between large volume of data and the complexity of the processing workflow, has prevented us until now to gather useful insights. To address this problem, in this work we demonstrate the scaling of the BTWorld workflow and process an order of magnitude more data than in our previous attempt [2].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation