Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-313
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection

Abstract: We propose a framework which ports Dirichlet Gaussian mixture model (DPGMM) based labels to deep neural network (DNN). The DNN trained using the unsupervised labels is used to extract a low-dimensional unsupervised speech representation, named as unsupervised bottleneck features (uBNFs), which capture considerable information for sound cluster discrimination. We investigate the performance of uBNF in queryby-example spoken term detection (QbE-STD) on the TIMIT English speech corpus. Our uBNF performs comparabl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
6
4

Relationship

2
8

Authors

Journals

citations
Cited by 40 publications
(28 citation statements)
references
References 19 publications
0
28
0
Order By: Relevance
“…As in [3,6,10], three different evaluation metrics are used for QbE speech search: 1) mean average precision (MAP), which is the mean of average precision for each query on search content. 2) Precision of the top N utterances in the test set (P@N), where N is the number of target utterances involving the query term.…”
Section: Methodsmentioning
confidence: 99%
“…As in [3,6,10], three different evaluation metrics are used for QbE speech search: 1) mean average precision (MAP), which is the mean of average precision for each query on search content. 2) Precision of the top N utterances in the test set (P@N), where N is the number of target utterances involving the query term.…”
Section: Methodsmentioning
confidence: 99%
“…Regarding the features used for query/utterance representation, [5,[13][14][15] employ Gaussian posteriorgrams; [16] proposes an i-vector-based approach for feature extraction; [17] uses phone log-likelihood ratio-based features; [18] employs posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers; [19] employs posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modelling; [11,15,[20][21][22][23][24][25][26] use phoneme posteriorgrams; [11,[27][28][29] employ bottleneck features; [30] employs posteriorgrams from non-parametric Bayesian models; [31] employs articulatory class-based posteriorgrams; [32] proposes an intrinsic spectral analysis; and [33] is based on the unsupervised segment-based bag of an acoustic words framework. All these studies employ the standard DTW algorithm for query search, except for [13], which employs the NS-DTW algorithm, [15,24,25,28,30], which employ the subsequence DTW (S-DTW) algorithm, [14], which presents a variant of the S-DTW algorithm, and [26], which employs the segmental DTW algorithm.…”
Section: Methods Based On Template Matching Of Featuresmentioning
confidence: 99%
“…UAM is a challenging problem with significant practical impact in speech as well as linguistics and cognitive science communities. It has been studied in applications such as ASR for low-resource languages [1], language identification [2] and query-by-example spoken term detection [3]. This problem is also relevant to endangered language protection [4] and understanding infants' language acquisition mechanism [5].…”
Section: Introductionmentioning
confidence: 99%