Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-472
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of features for synthetic speech detection

Abstract: The performance of biometric systems based on automatic speaker recognition technology is severely degraded due to spoofing attacks with synthetic speech generated using different voice conversion (VC) and speech synthesis (SS) techniques. Various countermeasures are proposed to detect this type of attack, and in this context, choosing an appropriate feature extraction technique for capturing relevant information from speech is an important issue. This paper presents a concise experimental review of different … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
90
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 205 publications
(106 citation statements)
references
References 36 publications
1
90
1
Order By: Relevance
“…While we obtained similar results using alternative representations such as linear frequency cepstral coefficients (LFCCs) [7,39], all work reported in this paper was performed with 60dimensional log linear filter bank (LFB) features extracted from 30 ms windows with a 10 ms frame shift and from audio waveforms which are truncated or concatenated to ≈4 second segments (64600 samples). To improve generalisation, we applied frequency masking augmentation [17,40] to mask a random selection of contiguous frequency bands during training.…”
Section: Implementation Detailsmentioning
confidence: 61%
“…While we obtained similar results using alternative representations such as linear frequency cepstral coefficients (LFCCs) [7,39], all work reported in this paper was performed with 60dimensional log linear filter bank (LFB) features extracted from 30 ms windows with a 10 ms frame shift and from audio waveforms which are truncated or concatenated to ≈4 second segments (64600 samples). To improve generalisation, we applied frequency masking augmentation [17,40] to mask a random selection of contiguous frequency bands during training.…”
Section: Implementation Detailsmentioning
confidence: 61%
“…We extract 60-dimensional linear frequency cepstral coefficients (LFCC) [28] of speech as low-level input feature to the neural network. The extraction way is the same as the ASVspoof2019 offical baseline system [27].…”
Section: Methodsmentioning
confidence: 99%
“…Baseline B02 uses linear frequency cepstral coefficients (LFCCs) [18] and a bandwidth of 30 Hz to 8 kHz. LFCCs are extracted using a 512-point discrete Fourier transform applied to windows of 20 ms with 50% overlap.…”
Section: Spoofing Countermeasuresmentioning
confidence: 99%
“…Systems are labelled (left) by the anonymised team identifier (TID) [3]. A short description of each follows: T45 [25]: A light CNN (LCNN) which operates upon LFCC features extracted from the first 600 frames and with the same frontend configuration as the B2 baseline CM [18].…”
Section: Single Systemsmentioning
confidence: 99%