2013 IEEE Workshop on Automatic Speech Recognition and Understanding 2013
DOI: 10.1109/asru.2013.6707765
|View full text |Cite
|
Sign up to set email alerts
|

Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
165
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 118 publications
(171 citation statements)
references
References 23 publications
0
165
0
Order By: Relevance
“…Supervised methods include convolutional [11][12][13] and recurrent neural network (RNN) models [14][15][16][17], trained with discriminative classification and contrastive losses. Unsupervised methods include using distances to a fixed reference set [10] and unsupervised autoencoding RNNs [18][19][20]. The recent unsupervised RNN of [21], which we refer to as the correspondence autoencoder RNN (CAE-RNN), is trained on pairs of word-like segments found in an unsupervised way.…”
Section: Introductionmentioning
confidence: 99%
“…Supervised methods include convolutional [11][12][13] and recurrent neural network (RNN) models [14][15][16][17], trained with discriminative classification and contrastive losses. Unsupervised methods include using distances to a fixed reference set [10] and unsupervised autoencoding RNNs [18][19][20]. The recent unsupervised RNN of [21], which we refer to as the correspondence autoencoder RNN (CAE-RNN), is trained on pairs of word-like segments found in an unsupervised way.…”
Section: Introductionmentioning
confidence: 99%
“…First, features are extracted at the frame level [17,18,19]. Second, DTW is performed to compare the feature matrix of the templates and the test segment.…”
Section: Dtw Baseline Systemmentioning
confidence: 99%
“…This embeds audio segments of different length into a fixed-dimensional space, therefore vector distance can be used for similarity measurement. Our method only requires a forward pass computation of the neural network, followed by a vector distance computation, and therefore is more efficient than [15] where an LVCSR is involved and [17] where multiple DTW computations are necessary. It also requires less computation than [18,19] since vector distance is used instead of DTW.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations