2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7179087
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised neural network based feature extraction using weak top-down constraints

Abstract: Deep neural networks (DNNs) have become a standard component in supervised ASR, used in both data-driven feature extraction and acoustic modelling. Supervision is typically obtained from a forced alignment that provides phone class targets, requiring transcriptions and pronunciations. We propose a novel unsupervised DNN-based feature extractor that can be trained without these resources in zeroresource settings. Using unsupervised term discovery, we find pairs of isolated word examples of the same unknown type… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
134
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
3
3

Relationship

4
6

Authors

Journals

citations
Cited by 100 publications
(134 citation statements)
references
References 20 publications
(38 reference statements)
0
134
0
Order By: Relevance
“…Other methods incorporate weak top-down supervision by first extracting pairs of similar word-or phraselike units using unsupervised term detection, and using these to constrain the representation learning. Examples include the correspondence autoencoder (cAE) [3] and ABNet [13]. Both aim to learn representations that make similar pairs even more similar; the ABNet additionally tries to make different pairs more different.…”
Section: A Background and Motivationmentioning
confidence: 99%
“…Other methods incorporate weak top-down supervision by first extracting pairs of similar word-or phraselike units using unsupervised term detection, and using these to constrain the representation learning. Examples include the correspondence autoencoder (cAE) [3] and ABNet [13]. Both aim to learn representations that make similar pairs even more similar; the ABNet additionally tries to make different pairs more different.…”
Section: A Background and Motivationmentioning
confidence: 99%
“…To obtain useful features, it is essential to pretrain the CAE as a conventional AE [31]. Our CAE has the same structure as the AE described in Section 5.1 and pretraining follows the same procedure described there.…”
Section: Correspondence Autoencoder Featuresmentioning
confidence: 99%
“…Another type of neural approach that has seen success in learning acoustic features is the correspondence autoencoder (CAE) [28]. Instead of trying to reconstruct its input, as is done in a standard autoencoder [29], the CAE tries to reconstruct another instance of the same type as its input.…”
Section: Introductionmentioning
confidence: 99%