Web cookies are used widely by publishers and 3rd parties to track users and their behaviors. Despite the ubiquitous use of cookies, there is little prior work on their characteristics such as standard attributes, placement policies, and the knowledge that can be amassed via 3rd party cookies. In this paper, we present an empirical study of web cookie characteristics, placement practices and information transmission. To conduct this study, we implemented a lightweight web crawler that tracks and stores the cookies as it navigates to websites. We use this crawler to collect over 3.2M cookies from the two crawls, separated by 18 months, of the top 100K Alexa web sites. We report on the general cookie characteristics and add context via a cookie category index and website genre labels. We consider privacy implications by examining specific cookie attributes and placement behavior of 3rd party cookies. We find that 3rd party cookies outnumber 1st party cookies by a factor of two, and we illuminate the connection between domain genres and cookie attributes. We find that less than 1% of the entities that place cookies can aggregate information across 75% of web sites. Finally, we consider the issue of information transmission and aggregation by domains via 3rd party cookies. We develop a mathematical framework to quantify user information leakage for a broad class of users, and present findings using real world domains. In particular, we demonstrate the interplay between a domain's footprint across the Internet and the browsing behavior of users, which has significant impact on information transmission. Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author's site if the Material is used in electronic media.
Forecasting models play a key role in money-making ventures in many different markets. Such models are often trained on data from various sources, some of which may be untrustworthy.An actor in a given market may be incentivised to drive predictions in a certain direction to their own benefit.Prior analyses of intelligent adversaries in a machine-learning context have focused on regression and classification.In this paper we address the non-iid setting of time series forecasting.We consider a forecaster, Bob, using a fixed, known model and a recursive forecasting method.An adversary, Alice, aims to pull Bob's forecasts toward her desired target series, and may exercise limited influence on the initial values fed into Bob's model.We consider the class of linear autoregressive models, and a flexible framework of encoding Alice's desires and constraints.We describe a method of calculating Alice's optimal attack that is computationally tractable, and empirically demonstrate its effectiveness compared to random and greedy baselines on synthetic and real-world time series data.We conclude by discussing defensive strategies in the face of Alice-like adversaries.
The large majority of electrical power in the United States today is generated from fossil feedstocks. While renewable energy sources offer compelling alternatives, there are many challenges and complexities that currently limit their use. The high-level objective of our work is to create an analytic framework to provide decision support for renewable energy use in electrical power generation in the US. For security reasons, many of the details of the infrastructure that would facilitate our work are not openly available. Thus, we seek to infer key properties of the power generation and transmission infrastructures, using alternative data sources and recognizing that grid dynamics are constrained by federal regulation and the laws of physics. In this discussion paper we describe the design space for our study and our initial analyses of energy pricing data. These data are openly available from Regional Transmission Organizations and Independent System Operators. Our results highlight the complexities and dynamics of the relationships between locations in the power grid, and set the stage for inferring physical and behavioral properties of the power grid.
Automated learning and decision making systems in public-facing applications are vulnerable to malicious attacks. Examples of such systems include spam detectors, credit card fraud detectors, and network intrusion detection systems. These systems are at further risk of attack when money is directly involved, such as market forecasters or decision systems used in determining insurance or loan rates. In this paper, we consider the setting where a predictor Bob has a fixed model, and an unknown attacker Alice aims to perturb (or poison) future test instances so as to alter Bob's prediction to her benefit. We focus specifically on Bob's optimal defense actions to limit Alice's effectiveness. We define a general framework for determining Bob's optimal defense action against Alice's worst-case attack. We then demonstrate our framework by considering linear predictors, where we provide tractable methods of determining the optimal defense action. Using these methods, we perform an empirical investigation of optimal defense actions for a particular class of linear models -- autoregressive forecasters -- and find that for ten real world futures markets, the optimal defense action reduces the Bob's loss by between 78 and 97%.
The right to erasure requires removal of a user's information from data held by organizations, with rigorous interpretations extending to downstream products such as learned models. Retraining from scratch with the particular user's data omitted fully removes its influence on the resulting model, but comes with a high computational cost. Machine "unlearning" mitigates the cost incurred by full retraining: instead, models are updated incrementally, possibly only requiring retraining when approximation errors accumulate. Rapid progress has been made towards privacy guarantees on the indistinguishability of unlearned and retrained models, but current formalisms do not place practical bounds on computation. In this paper we demonstrate how an attacker can exploit this oversight, highlighting a novel attack surface introduced by machine unlearning. We consider an attacker aiming to increase the computational cost of data removal. We derive and empirically investigate a poisoning attack on certified machine unlearning where strategically designed training data triggers complete retraining when removed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.