“Did You Hear That?” Learning to Play Video Games from Audio Cues

Gaina, Raluca D.; Stephenson, Matthew

doi:10.1109/cig.2019.8848088

Cited by 9 publications

(6 citation statements)

References 8 publications

(6 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Adversarial attacks on speech recognition systems also have been studied [10], [8], [79]. Nicholas et al [8] attacked DeepSpeech [80] by crafting adversarial voices in the whitebox setting, but failed to attack when playing over the air.…”

Section: Related Workmentioning

confidence: 99%

“…In the black-box setting, Rohan et al [10] combined a genetic algorithm with finite difference gradient estimation to craft adversarial voices for DeepSpeech, but achieved a limited success rate with strict length restriction over the voices. Alzantot et al [79] presented the first black-box adversarial attack on a CNN-based speech command classification model by exploiting a genetic algorithm. However, due to the difference between speaker recognition and speech recognition, these works are orthogonal to our work and cannot be applied to ivector-PLDA and GMM-UBM based SRSs.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Chen

Chenb

Fan

et al. 2021

2021 IEEE Symposium on Security and Privacy (SP)

110

144

View full text Add to dashboard Cite

Speaker recognition (SR) is widely used in our daily life as a biometric authentication or identification mechanism. The popularity of SR brings in serious security concerns, as demonstrated by recent adversarial attacks. However, the impacts of such threats in the practical black-box setting are still open, since current attacks consider the white-box setting only.In this paper, we conduct the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical blackbox setting. For this purpose, we propose an adversarial attack, named FAKEBOB, to craft adversarial samples. Specifically, we formulate the adversarial sample generation as an optimization problem, incorporated with the confidence of adversarial samples and maximal distortion to balance between the strength and imperceptibility of adversarial voices. One key contribution is to propose a novel algorithm to estimate the score threshold, a feature in SRSs, and use it in the optimization problem to solve the optimization problem. We demonstrate that FAKEBOB achieves close to 100% targeted attack success rate on both open-source and commercial systems. We further demonstrate that FAKEBOB is also effective (at least 65% untargeted success rate) on both open-source and commercial systems when playing over the air in the physical world. Moreover, we have conducted a human study which reveals that it is hard for human to differentiate the speakers of the original and adversarial voices. Last but not least, we show that three promising defense methods for adversarial attack from the speech recognition domain become ineffective on SRSs against FAKEBOB, which calls for more effective defense methods. We highlight that our study peeks into the security implications of adversarial attacks on SRSs, and realistically fosters to improve the security robustness of SRSs.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Chen

Chenb

Fan

et al. 2021

2021 IEEE Symposium on Security and Privacy (SP)

110

144

View full text Add to dashboard Cite

show abstract

“…There have been a number of prior studies relating to game playing AIs with sound. Gaina and Stephenson [5] expanded the General Video Game AI framework to support sound and trained an AI that played the game from sound only. Hegde et al [25] extended the VizDoom framework to provide the ingame sound to AIs and trained them in several scenarios with increasing difficulty to test the perception of sound.…”

Section: B Ai Interface and Blind Aismentioning

confidence: 99%

“…One of which is visually impaired players (VIs), which have been mostly ignored in the past [4]. Game developers or researchers are adding new features such as specific audio cues so that VIs can also experience and enjoy the games [5].…”

Section: Introductionmentioning

confidence: 99%

DareFightingICE Competition: A Fighting Game Sound Design and AI Competition

Khan¹,

Nguyen²,

Dai³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper presents a new competition -at the 2022 IEEE Conference on Games (CoG) -called DareFightingICE Competition. The competition has two tracks: a sound design track and an AI track. The game platform for this competition is also called DareFightingICE, a fighting game platform. Dare-FightingICE is a sound-design-enhanced version of FightingICE, used earlier in a competition at CoG until 2021 to promote artificial intelligence (AI) research in fighting games. In the sound design track, participants compete for the best sound design, given the default sound design of DareFightingICE as a sample. Participants of the AI track are asked to develop their AI algorithm that controls a character given only sound as the input (blind AI) to fight against their opponent; a sample deep-learning blind AI will be provided by us. Our means to maximize the synergy between the two tracks are also described. This competition serves to come up with effective sound designs for visually impaired players, a group in the gaming community which has been mostly ignored. To the best of our knowledge, DareFightingICE Competition is the first of its kind within and outside of CoG.

show abstract

“…A number of prior projects explored RL with audio observations. Gaina and Stephenson [8] augmented General Video Game AI framework to support sound, focusing on 2D spritebased games. Fig.…”

Section: Related Workmentioning

confidence: 99%

Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems

Hegde¹,

Kanervisto²,

Petrenko³

2021

Preprint

View full text Add to dashboard Cite

Humans and other intelligent animals evolved highly sophisticated perception systems that combine multiple sensory modalities. On the other hand, state-of-the-art artificial agents rely mostly on visual inputs or structured low-dimensional observations provided by instrumented environments. Learning to act based on combined visual and auditory inputs is still a new topic of research that has not been explored beyond simple scenarios. To facilitate progress in this area we introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We study the performance of different model architectures in a series of tasks that require the agent to recognize sounds and execute instructions given in natural language. Finally, we train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.We are currently in the process of merging the augmented simulator with the main ViZDoom code repository. Video demonstrations and experiment code can be found at https:// sites.google.com/view/sound-rl.

show abstract

“Did You Hear That?” Learning to Play Video Games from Audio Cues

Cited by 9 publications

References 8 publications

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

DareFightingICE Competition: A Fighting Game Sound Design and AI Competition

Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems

Contact Info

Product

Resources

About