Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-456
|View full text |Cite
|
Sign up to set email alerts
|

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification

Abstract: For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: the over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: speaker… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
28
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(29 citation statements)
references
References 9 publications
(9 reference statements)
0
28
1
Order By: Relevance
“…For example, artifacts introduced by the acoustic surroundings, such as reverberation, might be confused with the artifacts introduced by playback in some cases. Li et al [9] tried to use machine learning to detect replay attacks, but only obtained poor performance due to overfitting.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, artifacts introduced by the acoustic surroundings, such as reverberation, might be confused with the artifacts introduced by playback in some cases. Li et al [9] tried to use machine learning to detect replay attacks, but only obtained poor performance due to overfitting.…”
Section: Related Workmentioning
confidence: 99%
“…For example, artifacts introduced by the acoustic surroundings, such as reverberation, might be confused with the artifacts introduced by playback in some cases. Li et al [9] tried to use machine learning to detect replay attacks, but only obtained poor performance due to overfitting.In the ASVspoof 2017 challenge, an official corpus for detecting replay attack and a baseline system, based on the Gaussian mixture model (GMM) with constant Q cepstral coefficient (CQCC) features [10], were provided. The challenge required participants to propose a method that distinguished between genuine speech and a replay recording, where a total of 49 submissions were received from the participants.…”
mentioning
confidence: 99%
“…At such distances, some acoustic features can be used to identify the sound source of the speaker, e.g., in [31], [32], the authors use the "pop noise" caused by breathing to identify a live speaker. Other efforts [33], [34], [35] do not explicitly use close distance features, but the databases they use to develop their defense strategies were recorded at In the recording phase, the attacker records or synthesizes a malicious voice command. In the playback phase, the malicious voice command is transmitted from the playback device to the victim device over the air.…”
Section: B Sound Source Identification Using Acoustic Cuesmentioning
confidence: 99%
“…Constant Q cepstral coefficients (CQCC), which is proposed by Todisco M et al [7], was adopted in the baseline system for this challenge. After that, various features were used in recent literature to improve the performance of replay detection, such as the inverted Mel-frequency cepstral coefficients (IMFCC) [8], single frequency filtering coefficients (SFFCC) [9], high-frequency cepstral coefficients (HFCC) [10], and linear frequency cepstral coefficients (LFCC) [11]. All these works used CQCC features as baseline features and a Gaussian Mixture Model (GMM) classifier for the final classification.…”
Section: Introductionmentioning
confidence: 99%