DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks

Chen, Huili; Fu, Chen; Zhao, Jishen; Koushanfar, Farinaz

doi:10.24963/ijcai.2019/647

Cited by 253 publications

(224 citation statements)

References 8 publications

(3 reference statements)

Supporting

Mentioning

224

Contrasting

Order By: Relevance

“…During the testing phase, the detector was similarly tested using honest samples in addition to malicious samples corresponding to all four attacks. For comparison with the GRU detector, another DNN detector based on a multilayer perceptron (MLP) model [42] was also trained and tested to evaluate how the GRU model benefits from the ability to exploit the time-series nature of the data. The results obtained with the best network architectures, i.e., the architectures that produced the best results, for both the GRU and MLP models are presented in Table 5.…”

Section: B Results and Discussionmentioning

confidence: 99%

Detection of Lying Electrical Vehicles in Charging Coordination Using Deep Learning

et al. 2020

View full text Add to dashboard Cite

Because charging coordination is a solution for avoiding grid instability by prioritizing charging requests, electric vehicles may lie and send false data to illegally receive higher charging priorities. In this paper, we first study the impact of such attacks on both the lying and honest electric vehicles. Our evaluations indicate that lying electric vehicles have a higher chance of charging, whereas honest electric vehicles may not be able to charge or may charge late. Then, an anomaly-based detector based on a deep neural network is devised to identify lying electric vehicles. The idea is that since each electric vehicle driver has a particular driving pattern, the data reported by the corresponding electric vehicle should follow this pattern, and any deviation due to reporting false data can be detected. To train the detector, we first create an honest dataset for the charging coordination application using real driving traces and information provided by an electric vehicle manufacturer, and we then propose a number of attacks as a basis for creating malicious data. We train and evaluate a gated recurrent unit model using this dataset. Our evaluations indicate that our detector can detect lying electric vehicles with high accuracy and a low false alarm rate even when tested on attacks that are not represented in the training dataset. INDEX TERMS Security, false data injection, charging coordination, electric vehicles, and smart grid. Nomenclature ADASY N Adaptive synthetic sampling AU C Area under the curve CC Charging coordinator DN N Deep neural network EV Electric vehicle GA Genetic algorithm GRU Gated recurrent unit M LP Multilayer perceptron N SGA Non-dominated Sorting Genetic Algorithm RN N Recurrent neural network ROC Receiver operating characteristic SM OT E Synthetic Minority Over-sampling Technique SoC State of charge T CC Time to complete charge

show abstract

Section: B Results and Discussionmentioning

confidence: 99%

Detection of Lying Electrical Vehicles in Charging Coordination Using Deep Learning

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The existing backdoor detection methods can be roughly classified in two categories based on their application stages and detection targets. The first class is applied at the model inspection stage and aims to detect suspicious models and potential backdoors [9,31,51]; the other class is applied at inference time and aims to detect trigger-embedded inputs [8,10,15,18]. In our evaluation, we use NeuralCleanse [51] and STRIP [18] as the representative methods of the two categories.…”

Section: Backdoor Detectionmentioning

confidence: 99%

“…The existing defense methods against poisoned models mostly focus on the backdoor attacks, which, according to their strategies, can be categorized as: (i) cleansing potential contaminated data at the training stage [50], (ii) identifying suspicious models during model inspection [9,31,51], and (iii) detecting trigger-embedded inputs at inference time [8,10,15,18].…”

Section: Related Workmentioning

confidence: 99%

A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models

Pang

Shen

Zhang

et al. 2020

Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

View full text Add to dashboard Cite

Despite their tremendous success in a range of domains, deep learning systems are inherently susceptible to two types of manipulations: adversarial inputs-maliciously crafted samples that deceive target deep neural network (DNN) models, and poisoned models-adversely forged DNNs that misbehave on pre-defined inputs. While prior work has intensively studied the two attack vectors in parallel, there is still a lack of understanding about their fundamental connections: what are the dynamic interactions between the two attack vectors? what are the implications of such interactions for optimizing existing attacks? what are the potential countermeasures against the enhanced attacks? Answering these key questions is crucial for assessing and mitigating the holistic vulnerabilities of DNNs deployed in realistic settings. Here we take a solid step towards this goal by conducting the first systematic study of the two attack vectors within a unified framework. Specifically, (i) we develop a new attack model that jointly optimizes adversarial inputs and poisoned models; (ii) with both analytical and empirical evidence, we reveal that there exist intriguing "mutual reinforcement" effects between the two attack vectors-leveraging one vector significantly amplifies the effectiveness of the other; (iii) we demonstrate that such effects enable a large design spectrum for the adversary to enhance the existing attacks that exploit both vectors (e.g., backdoor attacks), such as maximizing the attack evasiveness with respect to various detection methods; (iv) finally, we discuss potential countermeasures against such optimized attacks and their technical challenges, pointing to several promising research directions.

show abstract

“…Countermeasures have been proposed for backdoor attacks. Some have used approaches such as detecting backdoor models [2,4,10], removing or disabling backdoors from backdoor models [2,8,12], and removing poison data from poison training datasets [1].…”

Section: Introductionmentioning

confidence: 99%

Disabling Backdoor and Identifying Poison Data by using Knowledge Distillation in Backdoor Attacks on Deep Neural Networks

Yoshida

Fujino

2020

Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security

View full text Add to dashboard Cite

Backdoor attacks are poisoning attacks and serious threats to deep neural networks. When an adversary mixes poison data into a training dataset, the training dataset is called a poison training dataset. A model trained with the poison training dataset becomes a backdoor model and it achieves high stealthiness and attack-feasibility. The backdoor model classifies only a poison image into an adversarial target class and other images into the correct classes. We propose an additional procedure to our previously proposed countermeasure against backdoor attacks by using knowledge distillation. Our procedure removes poison data from a poison training dataset and recovers the accuracy of the distillation model. Our countermeasure differs from previous ones in that it does not require detecting and identifying backdoor models, backdoor neurons, and poison data. A characteristic assumption in our defense scenario is that the defender can collect clean images without labels. A defender distills clean knowledge from a backdoor model (teacher model) to a distillation model (student model) with knowledge distillation. Subsequently, the defender removes poison-data candidates from the poison training dataset by comparing the predictions of the backdoor and distillation models. The defender fine-tunes the distillation model with the detoxified training dataset to improve classification accuracy. We evaluated our countermeasure by using two datasets. The backdoor is disabled by distillation and fine-tuning further improves the classification accuracy of the distillation model. The fine-tuning model achieved comparable accuracy to a baseline model when the number of clean images for a distillation dataset was more than 13% of the training data. Our results indicate that our countermeasure can be applied for general imageclassification tasks and that it works well whether the defender's received training dataset is a poison dataset or not. CCS CONCEPTS • Computing methodologies → Computer vision; • Security and privacy → Domain-specific security and privacy architectures.

show abstract

DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks

Cited by 253 publications

References 8 publications

Detection of Lying Electrical Vehicles in Charging Coordination Using Deep Learning

Detection of Lying Electrical Vehicles in Charging Coordination Using Deep Learning

A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models

Disabling Backdoor and Identifying Poison Data by using Knowledge Distillation in Backdoor Attacks on Deep Neural Networks

Contact Info

Product

Resources

About