Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1680
|View full text |Cite
|
Sign up to set email alerts
|

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification

Abstract: I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-utterance i-vector and its discriminator network is trained to determine whether an i-vector is generated by the ge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(10 citation statements)
references
References 17 publications
(20 reference statements)
0
10
0
Order By: Relevance
“…al. [19], where the discriminative power of a DNN's high-dimensional intermediate representation can differ significantly with small perturbations. In addition, we assume that applying an additive FRM combined with a multiplicative FRM will lead to further improvements.…”
Section: Filter-wise Re-scalingmentioning
confidence: 99%
See 1 more Smart Citation
“…al. [19], where the discriminative power of a DNN's high-dimensional intermediate representation can differ significantly with small perturbations. In addition, we assume that applying an additive FRM combined with a multiplicative FRM will lead to further improvements.…”
Section: Filter-wise Re-scalingmentioning
confidence: 99%
“…In addition, by applying an FRM through adding, we expect to provide small perturbations that lead to increased discriminative power. This is inspired by a previous study [19] that showed analyzing small alterations in high-dimensional space can drastically change discriminative power. By hypothesizing that these two approaches function in a complementary manner, we also propose to apply both approaches in sequence.…”
Section: Introductionmentioning
confidence: 95%
“…Motivated by the success of conditional generative adversarial network (CGAN) in speech enhancement [9] and i-vector transformation in short-utterance speaker verification [10], we propose to use CGAN on x-vector embeddings to compensate for additive noise in the speaker diarization framework. The approach is to train CGAN using both clean and noisy x-vectors, which can generate denoised x-vectors from noisy input xvectors.…”
Section: Denoising Systemmentioning
confidence: 99%
“…We use Wasserstein GAN [11] model in the CGAN framework. Furthermore, similar to [10], we incorporate multi-task training of G, where the generator network is integrated with another network Gsup for speaker prediction. The first section G is optimized to simul-taneously reduce the generator loss, mean square error (MSE) loss and cross-entropy (CE) loss.…”
Section: Denoising Systemmentioning
confidence: 99%
“…Its great success in image processing has inspired people to consider whether it can also be applied into the field of speech processing. In the paper [11], Zhang et al attempted to use conditional GAN to solve the impact of performance degradation caused by the variable-duration of utterances. Ding et al [12] proposed a multi-tasking GAN framework to extract the more distinctive speaker representation.…”
Section: Introductionmentioning
confidence: 99%