In the context of measuring the Internet, a long-standing question has been whether there exist well-localized physical entities in today's network where traffic from a representative cross-section of the constituents of the Internet can be observed at a fine-enough granularity to paint an accurate and informative picture of how these constituents shape and impact much of the structure and evolution of today's Internet and the actual traffic it carries.In this paper, we first answer this question in the affirmative by mining 17 weeks of continuous sFlow data from one of the largest European IXPs. Examining these weekly snapshots, we discover a vantage point with excellent visibility into the Internet, seeing week-in and week-out traffic from all 42K+ routed ASes, almost all 450K+ routed prefixes, from close to 1.5M servers, and around a quarter billion IPs from all around the globe. Second, to show the potential of such vantage points, we analyze the server-related portion of the traffic at this IXP, identify the server IPs and cluster them according to the organizations responsible for delivering the content. In the process, we observe a clear trend among many of the critical Internet players towards network heterogenization; that is, either hosting servers of third-party networks in their own infrastructures or pursuing massive deployments of their own servers in strategically chosen third-party networks. While the latter is a wellknown business strategy of companies such as Akamai, Google, and Netflix, we show in this paper the extent of network heterogenization in today's Internet and illustrate how it enriches the traditional, largely traffic-agnostic AS-level view of the Internet. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. IMC '13, October 23-25, 2013 Categories and Subject Descriptors AcknowledgmentsThis work would not have been possible without the help, engagement, and commitment of Walter Willinger.We want to express our gratitude towards the IXP for their generous cooperation and support. Moreover, we thank the anonymous reviewers for their useful feedback.
No abstract
BGP communities are a popular mechanism used by network operators for traffic engineering, blackholing, and to realize network policies and business strategies. In recent years, many research works have contributed to our understanding of how BGP communities are utilized, as well as how they can reveal secondary insights into real-world events such as outages and security attacks. However, one fundamental question remains unanswered: "Which ASes tag announcements with BGP communities and which remove communities in the announcements they receive?" A grounded understanding of where BGP communities are added or removed can help better model and predict BGP-based actions in the Internet and characterize the strategies of network operators.In this paper we develop, validate, and share data from the first algorithm that can infer BGP community tagging and cleaning behavior at the AS-level. The algorithm is entirely passive and uses BGP update messages and snapshots, e.g. from public route collectors, as input. First, we quantify the correctness and accuracy of the algorithm in controlled experiments with simulated topologies. To validate in the wild, we announce prefixes with communities and confirm that more than 90% of the ASes that we classify behave as our algorithm predicts. Finally, we apply the algorithm to data from four sets of BGP collectors: RIPE, RouteViews, Isolario, and PCH. Tuned conservatively, our algorithm ascribes community tagging and cleaning behaviors to more than 13k ASes, the majority of which are large networks and providers. We make our algorithm and inferences available as a public resource to the BGP research community. CCS CONCEPTS• Networks → Network protocols; Network management.
On March 17, 2013, an Internet census data set and an accompanying report were released by an anonymous author or group of authors. It created an immediate media buzz, mainly because of the unorthodox and unethical data collection methodology (i.e., exploiting default passwords to form the Carna botnet), but also because of the alleged unprecedented large scale of this census (even though legitimate census studies of similar and even larger sizes have been performed in the past). Given the unknown source of this released data set, little is known about it. For example, can it be ruled out that the data is faked? Or if it is indeed real, what is the quality of the released data?The purpose of this paper is to shed light on these and related questions and put the contributions of this anonymous Internet census study into perspective. Indeed, our findings suggest that the released data set is real and not faked, but that the measurements suffer from a number of methodological flaws and also lack adequate meta-data information. As a result, we have not been able to verify several claims that the anonymous author(s) made in the published report.In the process, we use this study as an educational example for illustrating how to deal with a large data set of unknown quality, hint at pitfalls in Internet-scale measurement studies, and discuss ethical considerations concerning third-party use of this released data set for publications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.