2020
DOI: 10.48550/arxiv.2011.11588
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(40 citation statements)
references
References 0 publications
0
39
1
Order By: Relevance
“…In this paper, we use the CPC-big model from [17] trained on the LibriLight unlab-6k set [5]. The encoder consists of five convolutional layers each with 512 channels, kernel sizes 10, 8, 4, 4, 4 , and strides 5, 4, 2, 2, 2 .…”
Section: Analysis Of Cpc Features 21 Contrastive Predictive Codingmentioning
confidence: 99%
See 3 more Smart Citations
“…In this paper, we use the CPC-big model from [17] trained on the LibriLight unlab-6k set [5]. The encoder consists of five convolutional layers each with 512 channels, kernel sizes 10, 8, 4, 4, 4 , and strides 5, 4, 2, 2, 2 .…”
Section: Analysis Of Cpc Features 21 Contrastive Predictive Codingmentioning
confidence: 99%
“…Finally, the linear classifier Wm is replaced with a single-layer transformer. We use the outputs of the second LSTM layer as speech features since they gave the best ABX phone discrimination results in [17]. In the remainder of the paper we refer to these as the CPC features.…”
Section: Analysis Of Cpc Features 21 Contrastive Predictive Codingmentioning
confidence: 99%
See 2 more Smart Citations
“…• CPC: We use the embeddings extracted from the pretrained CPC model from [51] as input. The model is trained with a context layer, predicts 12 steps into the future.…”
Section: B Two Stage Process For Extracting Segmental Representationsmentioning
confidence: 99%