Adversarial attacks hamper the functionality and accuracy of deep neural networks (DNNs) by meddling with subtle perturbations to their inputs. In this work, we propose a new mask-based adversarial defense scheme (MAD) for DNNs to mitigate the negative effect from adversarial attacks. Our method preprocesses multiple copies of a potential adversarial image by applying random masking, before the outputs of the DNN on all the randomly masked images are combined. As a result, the combined final output becomes more tolerant to minor perturbations on the original input. Compared with existing adversarial defense techniques, our method does not need any additional denoising structure or any change to a DNN’s architectural design. We have tested this approach on a collection of DNN models for a variety of datasets, and the experimental results confirm that the proposed method can effectively improve the defense abilities of the DNNs against all of the tested adversarial attack methods. In certain scenarios, the DNN models trained with MAD can improve classification accuracy by as much as 90% compared to the original models when given adversarial inputs.
Adversarial attacks hamper the functionality and accuracy of Deep Neural Networks (DNNs) by meddling with subtle perturbations to their inputs. In this work, we propose a new Mask-based Adversarial Defense scheme (MAD) for DNNs to mitigate the negative effect from adversarial attacks. To be precise, our method promotes the robustness of a DNN by randomly masking a portion of potential adversarial images, and as a result, the output of the DNN becomes more tolerant to minor input perturbations. Compared with existing adversarial defense techniques, our method does not need any additional denoising structure, nor any change to a DNN's design. We have tested this approach on a collection of DNN models for a variety of data sets, and the experimental results confirm that the proposed method can effectively improve the defense abilities of the DNNs against all of the tested adversarial attack methods. In certain scenarios, the DNN models trained with MAD have improved classification accuracy by as much as 90% compared to the original models that are given adversarial inputs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.