2020
DOI: 10.48550/arxiv.2010.14230
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

Abstract: Neural latent variable models enable the discovery of interesting structure in speech audio data. This paper presents a comparison of two different approaches which are broadly based on predicting future time-steps or auto-encoding the input signal. Our study compares the representations learned by vq-vae and vq-wav2vec in terms of sub-word unit discovery and phoneme recognition performance. Results show that future time-step prediction with vq-wav2vec achieves better performance. The best system achieves an e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 11 publications
(9 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?