Web tracking is a commonly-used practice on the Internet devoted to retrieve user information for activities such as personalization or advertisement. These techniques are said to drive the web economy, although they are commonly used to invade users' privacy. In the last years, a general concern raised about web tracking, looking forward to combat it in many ways like regulations, anti-tracking methods and even standardization. In this paper, we analyze and discuss the current techniques for web-tracking as well as techniques for its detection and analysis, and countermeasures to prevent web tracking.
Web pages have been steadily increasing in complexity over time, including code snippets from several distinct origins and organizations. While this may be a known phenomenon, its implications on the panorama of cookie tracking received little attention until now. Our study focuses on filling this gap, through the analysis of crawl results that are both large-scale and finegrained, encompassing the whole set of events that lead to the creation and sharing of around 138 million cookies from crawling more than 6 million webpages.Our analysis lets us paint a highly detailed picture of the cookie ecosystem, discovering an intricate network of connections between players that reciprocally exchange information and include each other's content in web pages whose owners may not even be aware. We discover that, in most webpages, tracking cookies are set and shared by organizations at the end of complex chains that involve several middlemen. We also study the impact of cookie ghostwriting, i.e., a common practice where an entity creates cookies in the name of another party, or the webpage.We attribute and define a set of roles in the cookie ecosystem, related to cookie creation and sharing. We see that organizations can and do follow different patterns, including behaviors that previous studies could not uncover: for example, many cookie ghostwriters send cookies they create to themselves, which makes them able to perform cross-site tracking even for users that deleted third-party cookies in their browsers. While some organizations concentrate the flow of information on themselves, others behave as dispatchers, allowing other organizations to perform tracking on the pages that include their content.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.