Prior measurement studies of the Internet have explored traffic and topology, but have largely ignored edge hosts. While the number of Internet hosts is very large, and many are hidden behind firewalls or in private address space, there is much to be learned from examining the population of visible hosts, those with public unicast addresses that respond to messages. In this paper we introduce two new approaches to explore the visible Internet. Applying statistical population sampling, we use censuses to walk the entire Internet address space, and surveys to probe frequently a fraction of that space. We then use these tools to evaluate address usage, where we find that only 3.6% of allocated addresses are actually occupied by visible hosts, and that occupancy is unevenly distributed, with a quarter of responsive /24 address blocks (subnets) less than 5% full, and only 9% of blocks more than half full. We show about 34 million addresses are very stable and visible to our probes (about 16% of responsive addresses), and we project from this up to 60 million stable Internet-accessible computers. The remainder of allocated addresses are used intermittently, with a median occupancy of 81 minutes. Finally, we show that many firewalls are visible, measuring significant diversity in the distribution of firewalled block size. To our knowledge, we are the first to take a census of edge hosts in the visible Internet since 1982, to evaluate the accuracy of active probing for address census and survey, and to quantify these aspects of the Internet.
Previous measurement-based IP geolocation algorithms have focused on accuracy, studying a few targets with increasingly sophisticated algorithms taking measurements from tens of vantage points (VPs). In this paper, we study how to scale up existing measurement-based geolocation algorithms like Shortest Ping and CBG to cover the whole Internet. We show that with many vantage points, VP proximity to the target is the most important factor affecting accuracy. This observation suggests our new algorithm that selects the best few VPs for each target from many candidates. This approach addresses the main bottleneck to geolocation scalability: minimizing traffic into each target (and also out of each VP) while maintaining accuracy. Using this approach we have currently geolocated about 35% of the allocated, unicast, IPv4 address-space (about 85% of the addresses in the Internet that can be directly geolocated). We visualize our geolocation results on a web-based address-space browser.
As the Internet matures, policy questions loom larger in its operation. When should an ISP, city, or government invest in infrastructure? How do their policies affect use? In this work, we develop a new approach to evaluate how policies, economic conditions and technology correlates with Internet use around the world. First, we develop an adaptive and accurate approach to estimate block availability, the fraction of active IP addresses in each /24 block over short timescales (every 11 minutes). Our estimator provides a new lens to interpret data taken from existing long-term outage measurements, thus requiring no additional traffic. (If new collection was required, it would be lightweight, since on average, outage detection requires less than 20 probes per hour per /24 block; less than 1% of background radiation.) Second, we show that spectral analysis of this measure can identify diurnal usage: blocks where addresses are regularly used during part of the day and idle in other times. Finally, we analyze data for the entire responsive Internet (3.7M /24 blocks) over 35 days. These global observations show when and where the Internet sleeps-networks are mostly always-on in the US and Western Europe, and diurnal in much of Asia, South America, and Eastern Europe. ANOVA (Analysis of Variance) testing shows that diurnal networks correlate negatively with country GDP and electrical consumption, quantifying that national policies and economics relate to networks.
To understand network behavior, researchers and enterprise network operators must interpret large amounts of network data. To understand and manage network events such as outages, route instability, and spam campaigns, they must interpret data that covers a range of networks and evolves over time. We propose a simple clustering algorithm that helps identify spatial clusters of network events based on correlations in event timing, producing 2-D visualizations. We show that these visualizations where they reveal the extent, timing, and dynamics of network outages such as January 2011 Egyptian change of government, and the March 2011 Japanese earthquake. We also show they reveal correlations in routing changes that are hidden from AS-path analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.