Alina Oprea scite author profile

As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains. 1

show abstract

FlipIt: The Game of “Stealthy Takeover”

Dijk¹,

Juels²,

Oprea³

et al. 2012

J Cryptol

221

154

View full text Add to dashboard Cite

Recent targeted attacks have increased significantly in sophistication, undermining the fundamental assumptions on which most cryptographic primitives rely for security. For instance, attackers launching an Advanced Persistent Threat (APT) can steal full cryptographic keys, violating the very secrecy of "secret" keys that cryptographers assume in designing secure protocols. In this article, we introduce a game-theoretic framework for modeling various computer security scenarios prevalent today, including targeted attacks. We are particularly interested in situations in which an attacker periodically compromises a system or critical resource completely, learns all its secret information and is not immediately detected by the system owner or defender.We propose a two-player game between an attacker and defender called FL I PIT or The Game of "Stealthy Takeover." In FL I PIT, players compete to control a shared resource. Unlike most existing games, FL I PIT allows players to move at any given time, taking control of the resource. The identity of the player controlling the resource, however, is not revealed until a player actually moves. To move, a player pays a certain move cost. The objective of each player is to control the resource a large fraction of time, while minimizing his total move cost. FL I PIT provides a simple and elegant framework in which we can formally reason about the interaction between attackers and defenders in practical scenarios.In this article, we restrict ourselves to games in which one of the players (the defender) plays with a renewal strategy, one in which the intervals between consecutive moves are chosen independently and uniformly at random from a fixed probability distribution. We consider attacker strategies ranging in increasing sophistication from simple periodic strategies (with moves spaced at equal time intervals) to more complex adaptive strategies, in which moves are determined based on feedback received during the game. For different classes of strategies employed by the attacker, we determine strongly dominant strategies for both players (when they exist), strategies that achieve higher benefit than all other strategies in a particular class. When strongly dominant strategies do not exist, our goal is to characterize the residual game consisting of strategies that are not strongly dominated by other strategies. We also prove equivalence or strict inclusion of certain classes of strategies under different conditions. Our analysis of different FL I PIT variants teaches cryptographers, system designers, and the community at large some valuable lessons:1. Systems should be designed under the assumption of repeated total compromise, including theft of cryptographic keys. FL I PIT provides guidance on how to implement a cost-effective defensive strategy.2. Aggressive play by one player can motivate the opponent to drop out of the game (essentially not to play at all). Therefore, moving fast is a good defensive strategy, but it can only be implemented if move costs are lo...

show abstract

HomeAlone: Co-residency Detection in the Cloud via Side-Channel Analysis

Zhang

Juels²,

Oprea³

et al. 2011

265

154

View full text Add to dashboard Cite

Security is a major barrier to enterprise adoption of cloud computing. Physical co-residency with other tenants poses a particular risk, due to pervasive virtualization in the cloud. Recent research has shown how side channels in shared hardware may enable attackers to exfiltrate sensitive data across virtual machines (VMs). In view of such risks, cloud providers may promise physically isolated resources to select tenants, but a challenge remains: Tenants still need to be able to verify physical isolation of their VMs.We introduce HomeAlone, a system that lets a tenant verify its VMs' exclusive use of a physical machine. The key idea in HomeAlone is to invert the usual application of side channels. Rather than exploiting a side channel as a vector of attack, HomeAlone uses a side-channel (in the L2 memory cache) as a novel, defensive detection tool. By analyzing cache usage during periods in which "friendly" VMs coordinate to avoid portions of the cache, a tenant using HomeAlone can detect the activity of a co-resident "foe" VM. Key technical contributions of HomeAlone include classification techniques to analyze cache usage and guest operating system kernel modifications that minimize the performance impact of friendly VMs sidestepping monitored cache portions. HomeAlone requires no modification of existing hypervisors and no special action or cooperation by the cloud provider.

show abstract

Extracting Training Data from Large Language Models

Carlini¹,

Tramèr²,

Wallace³

et al. 2020

Preprint

View full text Add to dashboard Cite

It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.

show abstract

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data

Oprea¹,

Li²,

Yen³

et al. 2015

134

View full text Add to dashboard Cite

Abstract-RecentWe address the problem of detecting early-stage infection in an enterprise setting by proposing a new framework based on belief propagation inspired from graph theory. Belief propagation can be used either with "seeds" of compromised hosts or malicious domains (provided by the enterprise security operation center -SOC) or without any seeds. In the latter case we develop a detector of C&C communication particularly tailored to enterprises which can detect a stealthy compromise of only a single host communicating with the C&C server.We demonstrate that our techniques perform well on detecting enterprise infections. We achieve high accuracy with low false detection and false negative rates on two months of anonymized DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of real-world web proxy logs collected at the border of a large enterprise. Through careful manual investigation in collaboration with the enterprise SOC, we show that our techniques identified hundreds of malicious domains overlooked by state-of-the-art security products.

show abstract

Defending against the Unknown Enemy: Applying FlipIt to System Security

Bowers¹,

Dijk²,

Griffin³

et al. 2012

View full text Add to dashboard Cite

Abstract. Most cryptographic systems carry the basic assumption that entities are able to preserve the secrecy of their keys. With attacks today showing ever increasing sophistication, however, this tenet is eroding. "Advanced Persistent Threats" (APTs), for instance, leverage zero-day exploits and extensive system knowledge to achieve full compromise of cryptographic keys and other secrets. Such compromise is often silent, with defenders failing to detect the loss of private keys critical to protection of their systems. The growing virulence of today's threats clearly calls for new models of defenders' goals and abilities. In this paper, we explore applications of FL I PIT, a novel game-theoretic model of system defense introduced in [14]. In FL I PIT, an attacker periodically gains complete control of a system, with the unique feature that system compromises are stealthy, i.e., not immediately detected by the system owner, called the defender. We distill out several lessons from our study of FL I PIT and demonstrate their application to several real-world problems, including password reset policies, key rotation, VM refresh and cloud auditing.

show abstract

How to tell if your cloud files are vulnerable to drive crashes

Bowers¹,

Dijk²,

Juels³

et al. 2011

View full text Add to dashboard Cite

This paper presents a new challenge-verifying that a remote server is storing a file in a fault-tolerant manner, i.e., such that it can survive hard-drive failures. We describe an approach called the Remote Assessment of Fault Tolerance (RAFT). The key technique in a RAFT is to measure the time taken for a server to respond to a read request for a collection of file blocks. The larger the number of hard drives across which a file is distributed, the faster the read-request response. Erasure codes also play an important role in our solution. We describe a theoretical framework for RAFTs and show experimentally that RAFTs can work in practice.

show abstract

New approaches to security and availability for cloud data

2013

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alina Oprea

Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

FlipIt: The Game of “Stealthy Takeover”

HomeAlone: Co-residency Detection in the Cloud via Side-Channel Analysis

Extracting Training Data from Large Language Models

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data

Defending against the Unknown Enemy: Applying FlipIt to System Security

How to tell if your cloud files are vulnerable to drive crashes

New approaches to security and availability for cloud data

Contact Info

Product

Resources

About