Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-349
|View full text |Cite
|
Sign up to set email alerts
|

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

Abstract: Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies while generating the representation. In this work, we propose Non-Autoregressive Predictive Coding (NPC), a self-supervised method, to learn a speech representation in a non-autoregressive manner by relying only on local dependencies of speech. NPC has a conceptually simple obj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 39 publications
(26 citation statements)
references
References 17 publications
(40 reference statements)
0
26
0
Order By: Relevance
“…Some studies explore alternatives to masking the input directly. In non-autoregressive predictive coding (NPC) [89], time masking is introduced through masked convolution blocks. Taking inspiration from XLNet [90], it has also been suggested that the input be reconstructed from a shuffled version [91] to address the discrepancy between pre-training and fine-tuning of masking-based approaches.…”
Section: U L T I -T a R G E T A P C B E S T -R Qmentioning
confidence: 99%
“…Some studies explore alternatives to masking the input directly. In non-autoregressive predictive coding (NPC) [89], time masking is introduced through masked convolution blocks. Taking inspiration from XLNet [90], it has also been suggested that the input be reconstructed from a shuffled version [91] to address the discrepancy between pre-training and fine-tuning of masking-based approaches.…”
Section: U L T I -T a R G E T A P C B E S T -R Qmentioning
confidence: 99%
“…Transformer encoders and bidirectional RNNs have been considered as context networks for realising MPC. Similarly, the recently proposed Non-autoregressive predictive coding (NPC) [52] also applies a mask on its model input, but it learns representations based on local dependencies of an input sequence, rather than globally. The MPC approaches can learn effective representations of sequential data in a non-autoregressive way, and hence achieve considerable speed-up in training.…”
Section: B Ssl Frameworkmentioning
confidence: 99%
“…Generative approaches learn SSL representations by reconstructing input features given historical or unmasked ones. Representative models in this type include APC (Chung et al, 2019;Chung and Glass, 2020a,b), VQ-APC (Chung et al, 2020), De-CoAR * , DeCoAR 2.0 (Ling and Liu, 2020) * , Mockingjay * , TERA (Liu et al, 2021b) * , MPC (Jiang et al, 2019(Jiang et al, , 2021, pMPC (Yue and Li, 2021), speech-XLNet (Song et al, 2020) NPC (Liu et al, 2021a), and PASE+ (Pascual et al, 2019;Ravanelli et al, 2020). Contrastive approaches pre-train representations to distinguish negative examples from real ones.…”
Section: Speech Ssl Approachesmentioning
confidence: 99%