2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639333
|View full text |Cite
|
Sign up to set email alerts
|

Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection

Abstract: Recently the posteriorgram-based template matching framework has been successfully applied to query-by-example spoken term detection tasks for low-resource languages. This framework employs a tokenizer to derive posteriorgrams, and applies dynamic time warping (DTW) to the posteriorgrams to locate the possible occurrences of a query term. Based on this framework, we propose to improve the detection performance by using multiple tokenizers with DTW distance matrix combination. The proposed approach uses multipl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 39 publications
(21 citation statements)
references
References 18 publications
0
21
0
Order By: Relevance
“…Regarding the features used for query/utterance representation, [5,[13][14][15] employ Gaussian posteriorgrams; [16] proposes an i-vector-based approach for feature extraction; [17] uses phone log-likelihood ratio-based features; [18] employs posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers; [19] employs posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modelling; [11,15,[20][21][22][23][24][25][26] use phoneme posteriorgrams; [11,[27][28][29] employ bottleneck features; [30] employs posteriorgrams from non-parametric Bayesian models; [31] employs articulatory class-based posteriorgrams; [32] proposes an intrinsic spectral analysis; and [33] is based on the unsupervised segment-based bag of an acoustic words framework. All these studies employ the standard DTW algorithm for query search, except for [13], which employs the NS-DTW algorithm, [15,24,25,28,30], which employ the subsequence DTW (S-DTW) algorithm, [14], which presents a variant of the S-DTW algorithm, and [26], which employs the segmental DTW algorithm.…”
Section: Methods Based On Template Matching Of Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…Regarding the features used for query/utterance representation, [5,[13][14][15] employ Gaussian posteriorgrams; [16] proposes an i-vector-based approach for feature extraction; [17] uses phone log-likelihood ratio-based features; [18] employs posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers; [19] employs posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modelling; [11,15,[20][21][22][23][24][25][26] use phoneme posteriorgrams; [11,[27][28][29] employ bottleneck features; [30] employs posteriorgrams from non-parametric Bayesian models; [31] employs articulatory class-based posteriorgrams; [32] proposes an intrinsic spectral analysis; and [33] is based on the unsupervised segment-based bag of an acoustic words framework. All these studies employ the standard DTW algorithm for query search, except for [13], which employs the NS-DTW algorithm, [15,24,25,28,30], which employ the subsequence DTW (S-DTW) algorithm, [14], which presents a variant of the S-DTW algorithm, and [26], which employs the segmental DTW algorithm.…”
Section: Methods Based On Template Matching Of Featuresmentioning
confidence: 99%
“…The DTW search is carried out for Spanish, English, and European Portuguese languages individually. An additional DTW search based on averaging all the cost matrices given by the three languages is conducted, as in [18].…”
Section: Searchmentioning
confidence: 99%
“…The input features for the ASM tokenizer were the same as those for the GMM tokenizer. Combination of these two tokenizers was performed by the DTW matrix combination approach [11]. PRF and score normalization were used as the back-end.…”
Section: Restricted Systemsmentioning
confidence: 99%
“…All these tokenizers were used to generate posteriorgrams, and Dynamic Time Warping (DTW) was applied for detection. To exploit the complementary information of all the tokenizers, a DTW matrix combination approach [11] was used. Pseudo relevance feedback (PRF) and score normalization were used as the back-end.…”
Section: Open Systemsmentioning
confidence: 99%
See 1 more Smart Citation