Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-349
|View full text |Cite
|
Sign up to set email alerts
|

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

Abstract: Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies while generating the representation. In this work, we propose Non-Autoregressive Predictive Coding (NPC), a self-supervised method, to learn a speech representation in a non-autoregressive manner by relying only on local dependencies of speech. NPC has a conceptually simple obj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 44 publications
(32 citation statements)
references
References 17 publications
0
29
0
Order By: Relevance
“…Some studies explore alternatives to masking the input directly. In non-autoregressive predictive coding (NPC) [89], time masking is introduced through masked convolution blocks. Taking inspiration from XLNet [90], it has also been suggested that the input be reconstructed from a shuffled version [91] to address the discrepancy between pre-training and fine-tuning of masking-based approaches.…”
Section: U L T I -T a R G E T A P C B E S T -R Qmentioning
confidence: 99%
“…Some studies explore alternatives to masking the input directly. In non-autoregressive predictive coding (NPC) [89], time masking is introduced through masked convolution blocks. Taking inspiration from XLNet [90], it has also been suggested that the input be reconstructed from a shuffled version [91] to address the discrepancy between pre-training and fine-tuning of masking-based approaches.…”
Section: U L T I -T a R G E T A P C B E S T -R Qmentioning
confidence: 99%
“…Transformer encoders and bidirectional RNNs have been considered as context networks for realising MPC. Similarly, the recently proposed Non-autoregressive predictive coding (NPC) [52] also applies a mask on its model input, but it learns representations based on local dependencies of an input sequence, rather than globally. The MPC approaches can learn effective representations of sequential data in a non-autoregressive way, and hence achieve considerable speed-up in training.…”
Section: B Ssl Frameworkmentioning
confidence: 99%
“…Generative modeling incorporates language model-like training losses to predict unseen regions (such as future or masked frames), in order to maximize the likelihood of the observed data. Examples include APC [43], VQ-APC [44], Mockingjay [45], TERA [46], and NPC [47]. Discriminative modeling aims to discriminate (or contrast) the target unseen frame with randomly sampled ones, which is equivalent to mutual information maximization.…”
Section: B Self-supervised Speech Representation Learningmentioning
confidence: 99%