Simulation of distributed applications and overlay networks is challenging. Often the results generated in simulation do not match experimental results. Distributed testbeds like Planet-Lab help to bridge the gap, but they do not offer enough nodes to do an Internet scale evaluation. In this paper we use a tool called TopDNS for generating realistic topologies for simulations, using the Planet-Lab to collect measurement data. We show, that simulation results may differ significantly from earlier results using synthesized topologies. We provide a data analysis to explain the observed results and to provide a better understanding of latency between hosts in certain DNS name spaces.
I� IntroductionIn recent years, peer-to-peer overlay networks have been used to implement applications, that before have been implemented using client-server architecture. Such applications, like group communication [1], web caching [2], [3], block storage [4], and e-mail [5] should benefit from the increased reliability and connectivity of overlay networks.One significant problem has been a correct evaluation of such systems using large scale simulation. Apart from the scalability of the simulator, a network topology must be chosen. For Internet scale systems, this is difficult, because the Internet is a large dynamic network with an unknown, continuously changing topology. Often a selection of known networks and nodes is used to extract certain characteristics, like connectivity, distance distribution, bandwidth distribution, etc. These properties are then scaled up to match the estimated Internet size or the targeted deployment size of the overlay application.During simulations, there are three gaps that limit the validity of the results: (1) the gap between Internet addresses and addresses in abstract topologies, (2) the gap between real workloads and abstract workloads, and (3) the gap between Internet characteristics and community characteristics.The first and second are caused by the common procedure of simulating the application's behavior on the overlay using workloads from earlier client-server implementations. This procedure does not work properly, if the workload contains references to the network topology, e.g., IP addresses, host names, network masks, etc. Hence, an abstraction of these workloads must be produced.The third problem is, that for certain applications, only a very limited group of Internet hosts is important. The statistical network characteristics of this group might be different from the characteristics of the overall Internet. E.g., if an illegal file sharing application for movies shall be simulated, the community of participating nodes might consist of nodes with DSL dial-up connections having good bandwidth but poor responsiveness (high delay, high jitter). None of them would be well connected to the Internet. If one would use an overlay that depends on using nodes with a high fan-out as supernodes (e.g. [6]), this might work well in simulations on a general Internet topology, but in our example, the real...