ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683639
|View full text |Cite
|
Sign up to set email alerts
|

Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints in Encoder-decoder Models

Abstract: We investigate unsupervised models that can map a variableduration speech segment to a fixed-dimensional representation. In settings where unlabelled speech is the only available resource, such acoustic word embeddings can form the basis for "zero-resource" speech search, discovery and indexing systems. Most existing unsupervised embedding methods still use some supervision, such as word or phoneme boundaries. Here we propose the encoder-decoder correspondence autoencoder (ENCDEC-CAE), which, instead of true w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
108
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 60 publications
(112 citation statements)
references
References 34 publications
(63 reference statements)
4
108
0
Order By: Relevance
“…We used this data to tune the number of pairs for the CAE-RNN, the vocabulary size for the CLASSIFIERRNN and the number of training epochs. Other hyperparameters are set as in [21].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We used this data to tune the number of pairs for the CAE-RNN, the vocabulary size for the CLASSIFIERRNN and the number of training epochs. Other hyperparameters are set as in [21].…”
Section: Methodsmentioning
confidence: 99%
“…We next consider the unsupervised correspondence autoencoder RNN (CAE-RNN) of [21]. Since we do not have access acoustic word embedding Fig.…”
Section: Unsupervised Monolingual Acoustic Embeddingsmentioning
confidence: 99%
See 1 more Smart Citation
“…The first task is acoustic word discrimination, where we are given two word segments to determine whether they match or not. This task is equivalent to the objective of the single-view approach and has been used in prior papers [9,10,11,12,14,17]. We regard this task as our main evaluation task for training the proposed and baseline network architectures.…”
Section: Evaluation Tasksmentioning
confidence: 99%
“…However, both research efforts assumed that training utterances are already segmented into words while learning embeddings in an unsupervised way. Kamper [19] solved this mismatch by using an unsupervised term discovery system to find sample same-word pairs. For evaluating acoustic word embeddings, Ghannay et al [20,21] proposed to evaluate the intrinsic performances of acoustic word embeddings by comparing embedding similarity with the orthographic and phonetic similarity of the original words.…”
Section: Related Workmentioning
confidence: 99%