Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1874
|View full text |Cite
|
Sign up to set email alerts
|

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

Abstract: Automatic detection of phoneme or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ self-supervised training methods, such as contrastive predictive coding (CPC), where the next frame is predicted given past context. However, CPC only looks at the audio signal's frame-level structure. We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e.g. at the phoneme level. I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 29 publications
(33 citation statements)
references
References 24 publications
(53 reference statements)
0
33
0
Order By: Relevance
“…Table I shows that the DPDP approach outperforms [25], [26], but performs slightly worse on most metrics compared to the recent state-of-the-art unsupervised phone segmentation approaches [13], [14], [38]. Of the three DPDP systems, the CPC+K-means model (here used as a DPDP scoring network for the first time) achieves the best precision and R-value, V. DPDP FOR UNSUPERVISED WORD SEGMENTATION FROM SYMBOLIC INPUT The goal in unsupervised word segmentation from symbolic input is to break up an input sequence (normally of phonemes or phones) into subsequences representing words.…”
Section: Intermediate Evaluation: Unsupervised Phone Segmentationmentioning
confidence: 99%
See 4 more Smart Citations
“…Table I shows that the DPDP approach outperforms [25], [26], but performs slightly worse on most metrics compared to the recent state-of-the-art unsupervised phone segmentation approaches [13], [14], [38]. Of the three DPDP systems, the CPC+K-means model (here used as a DPDP scoring network for the first time) achieves the best precision and R-value, V. DPDP FOR UNSUPERVISED WORD SEGMENTATION FROM SYMBOLIC INPUT The goal in unsupervised word segmentation from symbolic input is to break up an input sequence (normally of phonemes or phones) into subsequences representing words.…”
Section: Intermediate Evaluation: Unsupervised Phone Segmentationmentioning
confidence: 99%
“…The idea is that a model would need to learn meaningful phonetic contrasts while being invariant to nuisance factors such as speaker. Bhati et al [13] extend this by using a second segment-level CPC layer. The segmental CPC (SCPC) consists of a frame-level CPC module, a differentiable boundary detector operating on the learned features, and a segment-level CPC module operating on aggregated features from the lower layer.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations