2022
DOI: 10.48550/arxiv.2203.05936
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Are discrete units necessary for Spoken Language Modeling?

Tu Anh Nguyen,
Benoit Sagot,
Emmanuel Dupoux

Abstract: Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text. Is such a discrete bottleneck necessary, potentially introducing irreversible errors in the encoding of the speech signal, or could we learn a language model without discrete units at all? In this work, show… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 8 publications
(15 reference statements)
0
1
0
Order By: Relevance
“…, u q T : u q k = q(u k ). Quantizing the targets has been shown to improve the quality of speech representations in contrastive prediction-based self-supervised learning [Baevski et al, 2020, Nguyen et al, 2022, perhaps by contributing to the prior of discreteness we wish to impose on the high-level features.…”
Section: Unit Discovery In Speechmentioning
confidence: 99%
“…, u q T : u q k = q(u k ). Quantizing the targets has been shown to improve the quality of speech representations in contrastive prediction-based self-supervised learning [Baevski et al, 2020, Nguyen et al, 2022, perhaps by contributing to the prior of discreteness we wish to impose on the high-level features.…”
Section: Unit Discovery In Speechmentioning
confidence: 99%