“…Thus, inaudible attacks [5], [21], [22] have been proposed, which exploit carrier signals outside the audible frequencies of human beings (e.g., 40 kHz) to inject attacks into ASR systems utilizing the nonlinearity vulnerability of microphone circuits, yet entirely unheard by victims. However, compared with audible playback speech samples, such attacks usually suffer from signal distortion and low SNR due to their dependence on various convert channels, e.g., ultrasound [57], laser [6], or electricity [24] signals, and the hardware imperfections these channels introduce. There is also a major branch of the research community that leverages the vulnerability of ASR models by adding slightly audible perturbations on the benign audio based on ϵ-constraint [7], [58] and psychoacoustic hiding [3], [4], to make the AEs sound benign but fool the ASR's transcription.…”