Speaker recognition is a task that identifies the speaker from multiple audios. Recently, advances in deep learning have considerably boosted the development of speech signal processing techniques. Speaker or speech recognition has been widely adopted in such applications as smart locks, smart vehicle-mounted systems, and financial services. However, deep neural network-based speaker recognition systems (SRSs) are susceptible to adversarial attacks, which fool the system to make wrong decisions by small perturbations, and this has drawn the attention of researchers to the security of SRSs. Unfortunately, there is no systematic review work in this domain. In this work, we conduct a comprehensive survey to fill this gap, which includes the development of SRSs, adversarial attacks and defenses against SRSs. Specifically, we first introduce the mainstream frameworks of SRSs and some commonly used datasets. Then, from the perspectives of adversarial example generation and evaluation, we introduce different attack tasks, the prior knowledge of attacks, perturbation objects, perturbation constraints, and attack effect evaluation indicators. Next, we focus on some effective defense strategies, including adversarial training, attack detection, and input refactoring against existing attacks, and analyze their strengths and weaknesses in terms of fidelity and robustness. Finally, we discuss the challenges posed by audio adversarial examples in SRSs and some valuable research topics in the future.
Deep neural networks (DNNs) have been widely adopted in many fields, and they greatly promote the Internet of Health Things (IoHT) systems by mining health-related information. However, recent studies have shown the serious threat to DNN-based systems posed by adversarial attacks, which has raised widespread concerns. Attackers maliciously craft adversarial examples (AEs) and blend them into the normal examples (NEs) to fool the DNN models, which seriously affects the analysis results of the IoHT systems. Text data is a common form in such systems, such as the patients’ medical records and prescriptions, and we study the security concerns of the DNNs for textural analysis. As identifying and correcting AEs in discrete textual representations is extremely challenging, the available detection techniques are still limited in performance and generalizability, especially in IoHT systems. In this paper, we propose an efficient and structure-free adversarial detection method, which detects AEs even in attack-unknown and model-agnostic circumstances. We reveal that sensitivity inconsistency prevails between AEs and NEs, leading them to react differently when important words in the text are perturbed. This discovery motivates us to design an adversarial detector based on adversarial features, which are extracted based on sensitivity inconsistency. Since the proposed detector is structure-free, it can be directly deployed in off-the-shelf applications without modifying the target models. Compared to the state-of-the-art detection methods, our proposed method improves adversarial detection performance, with an adversarial recall of up to 99.7% and an F1-score of up to 97.8%. In addition, extensive experiments have shown that our method achieves superior generalizability as it can be generalized across different attackers, models, and tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.