Proceedings 2019 Network and Distributed System Security Symposium 2019
DOI: 10.14722/ndss.2019.23362
|View full text |Cite
|
Sign up to set email alerts
|

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

Abstract: Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands -audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
164
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 120 publications
(164 citation statements)
references
References 37 publications
0
164
0
Order By: Relevance
“…4(b). We use a chirp signal from 50 Hz to 2 Out-of-plane displacement is defined as the displacement along the x 3 direction. 3 In-plane displacement is defined as the displacement along the x 1 direction.…”
Section: B Triggering Non-linearity Effect Via Solid Mediummentioning
confidence: 99%
See 1 more Smart Citation
“…4(b). We use a chirp signal from 50 Hz to 2 Out-of-plane displacement is defined as the displacement along the x 3 direction. 3 In-plane displacement is defined as the displacement along the x 1 direction.…”
Section: B Triggering Non-linearity Effect Via Solid Mediummentioning
confidence: 99%
“…With the rapidly growing popularity and functionality of voice-driven IoT devices, voice-based attacks have become a non-negligible security risk. Gong et al investigate and classify voice-based attacks [20] into four major categories: basic voice replay attacks [12], [29], [36], operating system level attacks [3], [15], [26], [53], machine learning level attacks [2], [9], [10], [13], [19], [43], [48], [51], and hardware level attacks [28], [52]. A machine learning level attack uses adversarial audio commands to attack automatic speech recognition (ASR) systems.…”
Section: Related Workmentioning
confidence: 99%
“…For instance, Wifi typically works great around 10 meters, Bluetooth within several meters, and NFC around 10 centimeters. It is clear that the speakers only supporting NFC are not ideal for remote hacking, since it requires 7 EAI Endorsed Transactions on Security and Safety 08 2019 -05 2020 | Volume 6 | Issue 22 | e3 the attackers be in the home close enough to the speaker. Regarding speakers supporting Bluetooth or Wifi, once attackers could retain a short distance with them outside the home, the speakers will be visible to attackers' audio devices (either Bluetooth or Wifi capable).…”
Section: Wireless Speakersmentioning
confidence: 99%
“…Vaidya et al [52] and Carlini et al [16] observed that attackers could issue hidden voice commands which were unrecognizable to human listeners but can be interpreted as desired commands by CMU Sphinx speech system, also in their blackbox attack, the voice commands can be understood by Google Speech API. Similarly, Hadi et al [7] use four methods to generate the noisy audios to practically attack several speech recognition models. Yuan et al [56] successfully embedded voice commands into regular songs stealthily, which can compromise Kaldi, one popular open-sourced speech recognition system.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation