Hao Hao Tan scite author profile

Speaker recognition is a task that identifies the speaker from multiple audios. Recently, advances in deep learning have considerably boosted the development of speech signal processing techniques. Speaker or speech recognition has been widely adopted in such applications as smart locks, smart vehicle-mounted systems, and financial services. However, deep neural network-based speaker recognition systems (SRSs) are susceptible to adversarial attacks, which fool the system to make wrong decisions by small perturbations, and this has drawn the attention of researchers to the security of SRSs. Unfortunately, there is no systematic review work in this domain. In this work, we conduct a comprehensive survey to fill this gap, which includes the development of SRSs, adversarial attacks and defenses against SRSs. Specifically, we first introduce the mainstream frameworks of SRSs and some commonly used datasets. Then, from the perspectives of adversarial example generation and evaluation, we introduce different attack tasks, the prior knowledge of attacks, perturbation objects, perturbation constraints, and attack effect evaluation indicators. Next, we focus on some effective defense strategies, including adversarial training, attack detection, and input refactoring against existing attacks, and analyze their strengths and weaknesses in terms of fidelity and robustness. Finally, we discuss the challenges posed by audio adversarial examples in SRSs and some valuable research topics in the future.

show abstract

Audio Steganography with Speech Recognition System

Tan

Liu

Lyu

et al. 2021

View full text Add to dashboard Cite

Learning to Discriminate Adversarial Examples by Sensitivity Inconsistency in IoHT Systems

Zhang

Tan

Zhu

et al. 2023

Journal of Healthcare Engineering

View full text Add to dashboard Cite

Deep neural networks (DNNs) have been widely adopted in many fields, and they greatly promote the Internet of Health Things (IoHT) systems by mining health-related information. However, recent studies have shown the serious threat to DNN-based systems posed by adversarial attacks, which has raised widespread concerns. Attackers maliciously craft adversarial examples (AEs) and blend them into the normal examples (NEs) to fool the DNN models, which seriously affects the analysis results of the IoHT systems. Text data is a common form in such systems, such as the patients’ medical records and prescriptions, and we study the security concerns of the DNNs for textural analysis. As identifying and correcting AEs in discrete textual representations is extremely challenging, the available detection techniques are still limited in performance and generalizability, especially in IoHT systems. In this paper, we propose an efficient and structure-free adversarial detection method, which detects AEs even in attack-unknown and model-agnostic circumstances. We reveal that sensitivity inconsistency prevails between AEs and NEs, leading them to react differently when important words in the text are perturbed. This discovery motivates us to design an adversarial detector based on adversarial features, which are extracted based on sensitivity inconsistency. Since the proposed detector is structure-free, it can be directly deployed in off-the-shelf applications without modifying the target models. Compared to the state-of-the-art detection methods, our proposed method improves adversarial detection performance, with an adversarial recall of up to 99.7% and an F1-score of up to 97.8%. In addition, extensive experiments have shown that our method achieves superior generalizability as it can be generalized across different attackers, models, and tasks.

show abstract

Semi-supervised music emotion recognition using noisy student training and harmonic pitch class profiles

Tan¹

2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hao Hao Tan

Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey

Audio Steganography with Speech Recognition System

Learning to Discriminate Adversarial Examples by Sensitivity Inconsistency in IoHT Systems

Semi-supervised music emotion recognition using noisy student training and harmonic pitch class profiles

Contact Info

Product

Resources

About