Perceptions of crime do not necessarily reflect its realities. The leading works in Anglo-American criminology have focused on this asymmetry between perceptions and realities of crime. Although Japanese academics have conducted sporadic studies on the fear of crime, this area of research has not yet reached a mature stage. The public perception of Japan as a safe and secure country might explain this lack of research. At present, most people do not think of crime as common in their communities. The mass media, however, has focused on topical incidents of crime as representations of change in community order. Complex mechanisms function in and out of the community that should be explored to advance serious study on the fear of crime in Japan.
We propose an audio-visual speech enhancement (AVSE) method conditioned both on the speaker's lip motion and on speakerdiscriminative embeddings. We particularly explore a method of extracting the embeddings directly from noisy audio in the AVSE setting without an enrollment procedure. We aim to improve speechenhancement performance by conditioning the model with the embedding. To achieve this goal, we devise an AV voice activity detection (AV-VAD) module and a speaker identification module for the AVSE model. The AV-VAD module assesses reliable frames from which the identification module can extract a robust embedding for achieving an enhancement with the lip motion. To effectively train our modules, we propose multi-task learning between the AVSE, speaker identification, and VAD. Experimental results show that (1) our method directly extracted robust speaker embeddings from the noisy audio without an enrollment procedure and (2) improved the enhancement performance compared with the conventional AVSE methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.