Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Xiang, Zhen; Miller, David J.; Kesidis, George

doi:10.1109/icassp40776.2020.9054581

Cited by 23 publications

(51 citation statements)

References 16 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A BA is typically specified by a target class with label t * ∈ C (|C| = K), a set of source classes S * ⊂ C, where t * / ∈ S * , and a backdoor pattern. Effective backdoor patterns in the literature are either human-imperceptible ( [2,5,18,15]) or human-perceptible ( [1,12,13]). Here we focus on the imperceptible case, where the backdoor pattern is embedded into a clean image x ∈ X by…”

Section: Imperceptible Backdoor Attackmentioning

confidence: 99%

“…REDs are post-training BA defenses without access to the training set, but with access to the trained classifier and an independent, clean dataset [15,12]. REDs typically consist of a backdoor pattern reverse-engineering/estimation step and an anomaly detection step.…”

Section: Reverse-engineering-based Backdoor Defense (Red)mentioning

confidence: 99%

“…REDs typically consist of a backdoor pattern reverse-engineering/estimation step and an anomaly detection step. For imperceptible BA detection, the key ideas of [15] are: 1) for any "non-backdoor" class pair (s, t) ∈ C × C where t � = t * or s / ∈ S * , and a sufficiently large set D s of clean images from class s, the min-norm perturbation required (via (1)) to induce most images in D s to be (mis)classified to t is large; 2) there exists a small-norm perturbation (ensured by the existence of ||v * ||) that induces most images in D s to be (mis)classified to t * for any s ∈ S * . Thus, the estimation step in [15] searches for the minimum l 2 norm pattern inducing at least π-fraction of misclassifications on D s , for each (s, t) ∈ C × C (s � = t) class pair:…”

Section: Reverse-engineering-based Backdoor Defense (Red)mentioning

confidence: 99%

“…Moreover, for these defenses, the number of clean images for detection are usually not sufficient to train even a shallow DNN. However, existing REDs either rely on an unrealistic assumption about the attack that the source classes include all classes except the target class [12,13,9,14], or require a significant number of clean images (and thus heavy computation) as compensation to relieve such assumption [15,16].…”

Section: Introductionmentioning

confidence: 99%

“…Our defense can also be potentially extended with little modification to detect perceptible BAs. Compared with the state-of-the-art RED against imperceptible BAs with arbitrary number of source classes ( [15]), our defense requires much fewer clean images for accurate BA detection (and thus also much lower computational complexity). Even with just two clean image per class, our defense detects 56 out of 60 attacks with 1 out of 10 false detections in our experiments on CIFAR-10.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

L-Red: Efficient Post-Training Detection of Imperceptible Backdoor Attacks Without Access to the Training Set

Xiang

Miller

Kesidis

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Backdoor attacks (BAs) are an emerging form of adversarial attack typically against deep neural network image classifiers. The attacker aims to have the classifier learn to classify to a target class when test images from one or more source classes contain a backdoor pattern, while maintaining high accuracy on all clean test images. Reverse-Engineering-based Defenses (REDs) against BAs do not require access to the training set but only to an independent clean dataset. Unfortunately, most existing REDs rely on an unrealistic assumption that all classes except the target class are source classes of the attack. REDs that do not rely on this assumption often require a large set of clean images and heavy computation. In this paper, we propose a Lagrangian-based RED (L-RED) that does not require knowledge of the number of source classes (or whether an attack is present). Our defense requires very few clean images to effectively detect BAs and is computationally efficient. Notably, we detect 56 out of 60 BAs using only two clean images per class in our experiments on CIFAR-10.

show abstract

Section: Imperceptible Backdoor Attackmentioning

confidence: 99%

Section: Reverse-engineering-based Backdoor Defense (Red)mentioning

confidence: 99%

Section: Reverse-engineering-based Backdoor Defense (Red)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

L-Red: Efficient Post-Training Detection of Imperceptible Backdoor Attacks Without Access to the Training Set

Xiang

Miller

Kesidis

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

Wang

Zhang

Liu

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

When the training data are maliciously tampered, the predictions of the acquired deep neural network (DNN) can be manipulated by an adversary known as the Trojan attack (or poisoning backdoor attack). The lack of robustness of DNNs against Trojan attacks could significantly harm real-life machine learning (ML) systems in downstream applications, therefore posing widespread concern to their trustworthiness. In this paper, we study the problem of the Trojan network (Tro-janNet) detection in the data-scarce regime, where only the weights of a trained DNN are accessed by the detector. We first propose a datalimited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection. We show that an effective data-limited TND can be established by exploring connections between Trojan attack and prediction-evasion adversarial attacks including per-sample attack as well as all-sample universal attack. In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples. We show that such a TND can be built by leveraging the internal response of hidden neurons, which exhibits the Trojan behavior even at random noise inputs. The effectiveness of our proposals is evaluated by extensive experiments under different model architectures and datasets including CIFAR-10, GTSRB, and ImageNet.

show abstract

Scalable Backdoor Detection in Neural Networks

Harikumar

Rana

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Cited by 23 publications

References 16 publications

L-Red: Efficient Post-Training Detection of Imperceptible Backdoor Attacks Without Access to the Training Set

L-Red: Efficient Post-Training Detection of Imperceptible Backdoor Attacks Without Access to the Training Set

Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

Scalable Backdoor Detection in Neural Networks

Contact Info

Product

Resources

About