Despite their immense popularity, deep learning-based acoustic systems are inherently vulnerable to adversarial attacks, wherein maliciously crafted audios trigger target systems to misbehave. In this paper, we present SA, a new class of attacks to generate adversarial audios. Compared with existing attacks, SA highlights with a set of signicant features: (i) versatile-it is able to deceive a range of end-to-end acoustic systems under both white-box and black-box settings; (ii) eective-it is able to generate adversarial audios that can be recognized as specic phrases by target acoustic systems; and (iii) stealthy-it is able to generate adversarial audios indistinguishable from their benign counterparts to human perception. We empirically evaluate SA on a set of state-of-the-art deep learning-based acoustic systems (including speech command recognition, speaker recognition and sound event classication), with results showing the versatility, eectiveness, and stealthiness of SA. For instance, it achieves 99.45% attack success rate on the IEMOCAP dataset against the ResNet18 model, while the generated adversarial audios are also misinterpreted by multiple popular ASR platforms, including Google Cloud Speech, Microsoft Bing Voice, and IBM Speech-to-Text. We further evaluate three potential defense methods to mitigate such attacks, which leads to promising directions for further research.