2009
DOI: 10.1016/j.comnet.2009.02.019
|View full text |Cite
|
Sign up to set email alerts
|

Wikipedia workload analysis for decentralized hosting

Abstract: a b s t r a c tWe study an access trace containing a sample of Wikipedia's traffic over a 107-day period aiming to identify appropriate replication and distribution strategies in a fully decentralized hosting environment. We perform a global analysis of the whole trace, and a detailed analysis of the requests directed to the English edition of Wikipedia. In our study, we classify client requests and examine aspects such as the number of read and save operations, significant load variations and requests for non… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
133
1

Year Published

2011
2011
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 287 publications
(135 citation statements)
references
References 23 publications
0
133
1
Order By: Relevance
“…All three vary between nearly idle and maximum throughput within a four minute period: the slope pattern gradually raises the target load from 0 and 6.2M RPS and then reduces its load; the step pattern increases load by 500 KRPS every 10 seconds; and finally the sine+noise pattern is a basic sinusoidal pattern modified by randomly adding sharp noise that is uniformly distributed over [-250,+250] KRPS and re-computed every 5 seconds. The slope pattern provides a baseline to study smooth changes, the step pattern models abrupt and massive changes, while the sine+noise pattern is representative of daily web patterns [48]. Fig.…”
Section: Discussionmentioning
confidence: 99%
“…All three vary between nearly idle and maximum throughput within a four minute period: the slope pattern gradually raises the target load from 0 and 6.2M RPS and then reduces its load; the step pattern increases load by 500 KRPS every 10 seconds; and finally the sine+noise pattern is a basic sinusoidal pattern modified by randomly adding sharp noise that is uniformly distributed over [-250,+250] KRPS and re-computed every 5 seconds. The slope pattern provides a baseline to study smooth changes, the step pattern models abrupt and massive changes, while the sine+noise pattern is representative of daily web patterns [48]. Fig.…”
Section: Discussionmentioning
confidence: 99%
“…To build our workloads in the simulator, we use a trace of Internet traffic from Wikipedia.org [45]. In particular, we use this tracefile with 2-month long data, which contains 10% of user requests that arrived at Wikipedia between October 1st, 2007 and November 30th, 2007.…”
Section: Real-world Workload Tracesmentioning
confidence: 99%
“…As we do not have data on availability requirements of website owners, we generate this dataset based on the frequency of end user HTTP requests directed at different websites and counting missed requests, similarly to how reliability is determined from the mean time between failures [17]. A public dataset of HTTP requests made to Wikipedia [19] is used. To obtain data for different websites, we consider Wikipedia in each language as an individual website, because of its unique group of end users (different in number and usage pattern).…”
Section: User Requirements Modelmentioning
confidence: 99%