This paper proposes a pop-noise detector using phoneme information for a voice liveness detection (VLD) framework. In recent years, spoofing attacks (e.g., reply, speech synthesis, and voice conversion) have become a serious problem against speaker verification systems. Some techniques have been proposed to protect the speaker verification systems from these spoofing attacks. The VLD framework has been proposed as one of fundamental solutions. The VLD framework identifies that an input sample is uttered by an actual human or played by a loudspeaker. To realize the VLD framework, pop-noise detection methods have been proposed and these methods perform well as the VLD module. However, since pop-noise is a common distortion in speech that occurs when a speaker’s breath reaches a microphone, the phenomenon of pop-noise is able to be occurred by winds or attackers arbitrary. It is one problem of the pop-noise detection methods. In order to improve the robustness of the pop-noise detection methods, this paper proposes a pop-noise detector using phoneme information as an evidence of an actual human. From the experimental results, the proposed method increases the robustness of the VLD against spoofing attacks.
This paper proposes a phoneme-based pop-noise (PN) detection algorithm for voice liveness detection (VLD) and automatic speaker verification systems. Recently, a lot of countermeasures against spoofing attacks (e.g., replay, speech synthesis) have been reported for speaker verification systems. A principle mechanism of almost all spoofing attacks is to replay recorded speeches via a loudspeaker. Therefore, one of the effective solutions against spoofing attacks is to determine whether an input speech is a genuine voice or a replayed one, and this is a framework of VLD. To realize the VLD framework, PN detection methods have been proposed. Since PN is a common distortion that occurs when speaker's breath reaches the inside of a microphone, the conventional PN detection methods simply capture PN periods during the input speech. However, the performances of the PN detection methods depend on microphone types and phrases. It may lead to vulnerability of the conventional PN detection methods. This paper proposes a novel PN detection method, focused on specific characteristics of phonemes related to the PN phenomenon. The experimental results show that the proposed method provides a higher performance than conventional PN detection methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.