Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1544
|View full text |Cite
|
Sign up to set email alerts
|

Aligned Contrastive Predictive Coding

Abstract: We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss, to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than the sequence of upcoming representations to which they will be aligned. In this way, the prediction network solves a simpler task of predicting the next symbols, but not their exact timing, while the encoding netwo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 10 publications
0
12
0
Order By: Relevance
“…Instead of treating classification independently for each future time-step as in standard CPC, the aligned CPC (ACPC) model of Chorowski et al [21] outputs a sequence of predictions that are then aligned to future time-steps. Since the model encourages piece-wise constant latent features, the idea is that changes in these features would correspond to phone boundaries.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of treating classification independently for each future time-step as in standard CPC, the aligned CPC (ACPC) model of Chorowski et al [21] outputs a sequence of predictions that are then aligned to future time-steps. Since the model encourages piece-wise constant latent features, the idea is that changes in these features would correspond to phone boundaries.…”
Section: Related Workmentioning
confidence: 99%
“…Segmentation performance increases by adding to the model in [1] a second CPC at the segment level (as in [2]). Interestingly ACPC [5] and mACPC do not attain the same segmentation performance level despite their similarities and the offset correction. On the other hand they do achieve much better phoneme prediction rates, both frame synced (frame-wise accuracy) and through alignment (CTC PER).…”
Section: Comparative Study In Segmentation and Classification Of Phon...mentioning
confidence: 93%
“…Finally, K and K s predictions are made at frame and segment levels conditioned on the corresponding context vectors, which are then aligned to M and M s upcoming encoded frames and segments respectively. The ACPC prediction loss, as described in [5], is applied at both levels. The two prediction losses from frames and segments are summed into the total loss to be optimized.…”
Section: Multi-level Acpcmentioning
confidence: 99%
See 2 more Smart Citations