Attack on Practical Speaker Verification System Using Universal Adversarial Perturbations

Zhang, Weiyi; Zhao, Shuning; Liu, Le; Li, Jianmin; Cheng, Xingliang; Zheng, Thomas Fang; Hu, Xiaolin

doi:10.1109/icassp39728.2021.9413467

Cited by 33 publications

(18 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use L ∞ and L 2 norms to quantify the perturbation magnitude in adversarial example generation, and adopt SNR and PESQ to measure the imperceptibility of crafted adversarial voices. These metrics have been widely adopted in the literature [9], [10], [11], [13], [14], [15], [16] and in general, can consistently reflect the degree of distortions according to our experimental results. Moreover, PESQ is an objective perceptual measure simulating the human auditory system [62].…”

Section: Discussion Of Limitationssupporting

confidence: 73%

“…Recently, adversarial attacks on speaker recognition have been extensively studied [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17]. Results show that both state-of-the-art open-source and commercial SRSs can be fooled by adding small perturbations to the original voice, even playing over the air in the physical world.…”

Section: Motivationmentioning

confidence: 99%

“…A.1 Details of the Datasets Spk 10 -enroll consists of 10 speakers (5 males and 5 females), 10 voices per speaker. The speakers are randomly selected from the "test-other" and "dev-other" subsets of the popular dataset Librispeech [9], [10], [11], [15], [57]. For each speaker, we select the top-10 longest voices in order to have better enrollment embedding [98], [99].…”

Section: Appendix a Supplemental Materialsmentioning

confidence: 99%

“…The popularity of SRSs has brought new security concerns. Recent studies have shown that both open-source and commercial SRSs are vulnerable to adversarial attacks [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17]. To thwart adversarial attacks, five input transformations [15], [16], [18], [19] and two adversarial training [9], derived from other domains, have been studied.…”

Section: Introductionmentioning

confidence: 99%

“…To thoroughly evaluate the defenses, we extend and implement all the recent promising adversarial attacks [7], [8], [9], [10], [15], [16], [17], [21], including 4 white-box attacks and 3 black-box attacks. The evaluation on 22 concrete attacks shows that the effectiveness of transformations does not necessarily decrease with increase of both distortion and attack strength, and their effectiveness varies with attacks, e.g., two time-domain transformations are more effective than others against L ∞ attacks (i.e., perturbations are limited in L ∞ norm) and feature-level transformations are often more effective than others against L 2 white-box attacks.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

Chen¹,

Zhao²,

Fu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Speaker recognition systems (SRSs) have recently been shown to be vulnerable to adversarial attacks, raising significant security concerns. In this work, we systematically investigate transformation and adversarial training based defenses for securing SRSs. According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition. With careful regard for best practices in defense evaluations, we analyze the strength of transformations to withstand adaptive attacks. We also evaluate and understand their effectiveness against adaptive attacks when combined with adversarial training. Our study provides lots of useful insights and findings, many of them are new or inconsistent with the conclusions in the image and speech recognition domains, e.g., variable and constant bit rate speech compressions have different performance, and some non-differentiable transformations remain effective against current promising evasion techniques which often work well in the image domain. We demonstrate that the proposed novel feature-level transformation combined with adversarial training is rather effective compared to the sole adversarial training in a complete white-box setting, e.g., increasing the accuracy by 13.62% and attack cost by two orders of magnitude, while other transformations do not necessarily improve the overall defense capability. This work sheds further light on the research directions in this field. We also release our evaluation platform SPEAKERGUARD to foster further research.

show abstract