Defending against Model Stealing via Verifying Embedded External Features

Zhu, Linghui; Li, Yiming; Jia, Xiaojun; Jiang, Yong; Xia, Shu-Tao; Cao, Xiaochun

doi:10.1609/aaai.v36i2.20036

Cited by 30 publications

(18 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…6. [92] propose to embed external features into the host model by embedding a few images modified via style transfer algorithm. A binary meta-classifier is also trained on the gradients of model weights (i.e., both host model and a benign model trained with clean data) using transformed images to extract the embedded external features.…”

Section: B Watermark-based Solutions To Ip Protection In Aigcmentioning

confidence: 99%

A Survey on ChatGPT: AI–Generated Contents, Challenges, and Solutions

Wang¹,

Pan²,

Yan³

et al. 2023

IEEE Open J. Comput. Soc.

View full text Add to dashboard Cite

With the widespread use of large artificial intelligence (AI) models such as ChatGPT, AI-generated content (AIGC) has garnered increasing attention and is leading a paradigm shift in content creation and knowledge representation. AIGC uses generative large AI algorithms to assist or replace humans in creating massive, high-quality, and human-like content at a faster pace and lower cost, based on user-provided prompts. Despite the recent significant progress in AIGC, security, privacy, ethical, and legal challenges still need to be addressed. This paper presents an in-depth survey of working principles, security and privacy threats, state-of-the-art solutions, and future challenges of the AIGC paradigm. Specifically, we first explore the enabling technologies, general architecture of AIGC, and discuss its working modes and key characteristics. Then, we investigate the taxonomy of security and privacy threats to AIGC and highlight the ethical and societal implications of GPT and AIGC technologies. Furthermore, we review the state-of-the-art AIGC watermarking approaches for regulatable AIGC paradigms regarding the AIGC model and its produced content. Finally, we identify future challenges and open research directions related to AIGC.

show abstract

Section: B Watermark-based Solutions To Ip Protection In Aigcmentioning

confidence: 99%

A Survey on ChatGPT: AI–Generated Contents, Challenges, and Solutions

Wang¹,

Pan²,

Yan³

et al. 2023

IEEE Open J. Comput. Soc.

View full text Add to dashboard Cite

show abstract

“…For example, defenders may round the probability vectors [4], introduce noise to the output vectors which will result in a high loss in the processes of model stealing [7], or only return the most confident label instead of the whole output vector [5]. However, these defenses may significantly reduce the performance of victim models and may even be bypassed by adaptive attacks [10], [11], [12].…”

Section: Active Defensesmentioning

confidence: 99%

“…For example, defenders can introduce randomness or perturbations in the victim models [4], [7], [8] or watermark the victim model via (targeted) backdoor attacks or data poisoning [9], [10], [11]. However, existing active defenses may lead to poor performance of the victim model and could even be bypassed by advanced adaptive attacks [10], [11], [12]; the verification-based methods target only limited simple stealing scenarios (e.g., direct copy or fine-tuning) and have minor effects in defending against more complicated model stealing. Besides, these methods also introduce some stealthy latent short-cuts (e.g., hidden backdoors) in the victim model, which could be maliciously used.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MOVE: Effective and Harmless Ownership Verification via Embedded External Features

Li¹,

Zhu²,

Xing³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Currently, deep neural networks (DNNs) are widely adopted in different applications. Despite its commercial values, training a well-performed DNN is resource-consuming. Accordingly, the well-trained model is valuable intellectual property for its owner. However, recent studies revealed the threats of model stealing, where the adversaries can obtain a function-similar copy of the victim model, even when they can only query the model. In this paper, we propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously, without introducing new security risks. In general, we conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by tempering a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection. Extensive experiments on benchmark datasets verify the effectiveness of our method and its resistance to potential adaptive attacks. The codes for reproducing the main experiments of our method are available at https://github.com/THUYimingLi/MOVE.

show abstract

“…The FSL approach involves using a large auxiliary set of labeled data from disjoint classes to acquire transferable knowledge or representations that can help in the few-shot tasks. Recently, the security implications of FSL have been brought to the forefront of the community (Li et al 2022a;Guan et al 2022), such as the challenge of training a robust few-shot model against adversarial attacks (Li et al 2019b;Jia et al 2020;Huang et al 2021Huang et al , 2023.…”

Section: Introductionmentioning

confidence: 99%

Does Few-Shot Learning Suffer from Backdoor Attacks?

Liu,

Jia,

et al. 2024

AAAI

View full text Add to dashboard Cite

The field of few-shot learning (FSL) has shown promising results in scenarios where training data is limited, but its vulnerability to backdoor attacks remains largely unexplored. We first explore this topic by first evaluating the performance of the existing backdoor attack methods on few-shot learning scenarios. Unlike in standard supervised learning, existing backdoor attack methods failed to perform an effective attack in FSL due to two main issues. Firstly, the model tends to overfit to either benign features or trigger features, causing a tough trade-off between attack success rate and benign accuracy. Secondly, due to the small number of training samples, the dirty label or visible trigger in the support set can be easily detected by victims, which reduces the stealthiness of attacks. It seemed that FSL could survive from backdoor attacks. However, in this paper, we propose the Few-shot Learning Backdoor Attack (FLBA) to show that FSL can still be vulnerable to backdoor attacks. Specifically, we first generate a trigger to maximize the gap between poisoned and benign features. It enables the model to learn both benign and trigger features, which solves the problem of overfitting. To make it more stealthy, we hide the trigger by optimizing two types of imperceptible perturbation, namely attractive and repulsive perturbation, instead of attaching the trigger directly. Once we obtain the perturbations, we can poison all samples in the benign support set into a hidden poisoned support set and fine-tune the model on it. Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms while preserving clean accuracy and maintaining stealthiness. This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.

show abstract

Defending against Model Stealing via Verifying Embedded External Features

Cited by 30 publications

References 25 publications

A Survey on ChatGPT: AI–Generated Contents, Challenges, and Solutions

A Survey on ChatGPT: AI–Generated Contents, Challenges, and Solutions

MOVE: Effective and Harmless Ownership Verification via Embedded External Features

Does Few-Shot Learning Suffer from Backdoor Attacks?

Contact Info

Product

Resources

About