In this paper, we propose a replay attack detection (RAD) method that uses spatial and spectral features of a stereo signal. To distinguish genuine and replayed utterance, we focus on non-speech segments, in which a human does not emit sound, but a loudspeaker for replay attack might emit some recorded noise or its electromagnetic noise. The generalized cross-correlation (GCC) based spatial features capture this difference. To improve the robustness against the variety of recording environments, we combine the spatial features with spectral features. In particular, we fuse the output scores of GCC-based and spectral feature-based methods. In experiments, we confirm the effectiveness of the combination of spatial and spectral features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.