Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection

Wang, Haipeng; Lee, Tan; Leung, Cheung-Chi; Ma, Bin; Li, Haizhou

doi:10.1109/icassp.2013.6639333

Cited by 39 publications

(21 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regarding the features used for query/utterance representation, [5,[13][14][15] employ Gaussian posteriorgrams; [16] proposes an i-vector-based approach for feature extraction; [17] uses phone log-likelihood ratio-based features; [18] employs posteriorgrams derived from various unsupervised tokenizers, supervised tokenizers, and semi-supervised tokenizers; [19] employs posteriorgrams derived from a Gaussian mixture model (GMM) tokenizer, phoneme recognition, and acoustic segment modelling; [11,15,[20][21][22][23][24][25][26] use phoneme posteriorgrams; [11,[27][28][29] employ bottleneck features; [30] employs posteriorgrams from non-parametric Bayesian models; [31] employs articulatory class-based posteriorgrams; [32] proposes an intrinsic spectral analysis; and [33] is based on the unsupervised segment-based bag of an acoustic words framework. All these studies employ the standard DTW algorithm for query search, except for [13], which employs the NS-DTW algorithm, [15,24,25,28,30], which employ the subsequence DTW (S-DTW) algorithm, [14], which presents a variant of the S-DTW algorithm, and [26], which employs the segmental DTW algorithm.…”

Section: Methods Based On Template Matching Of Featuresmentioning

confidence: 99%

See 1 more Smart Citation

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Tejedor

Toledano

López-Otero

et al. 2018

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems submitted to the ALBAYZIN QbE STD 2016 Evaluation held as a part of the ALBAYZIN 2016 Evaluation Campaign at the IberSPEECH 2016 conference. Special attention was given to the evaluation design so that a thorough post-analysis of the main results could be carried out. Two different Spanish speech databases, which cover different acoustic and language domains, were used in the evaluation: the MAVIR database, which consists of a set of talks from workshops, and the EPIC database, which consists of a set of European Parliament sessions in Spanish. We present the evaluation design, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a thorough analysis and discussion. Four different research groups participated in the evaluation, and a total of eight template matching-based systems were submitted. We compare the systems submitted to the evaluation and make an in-depth analysis based on some properties of the spoken queries, such as query length, single-word/multi-word queries, and in-language/out-of-language queries.

show abstract

Section: Methods Based On Template Matching Of Featuresmentioning

confidence: 99%

“…The DTW search is carried out for Spanish, English, and European Portuguese languages individually. An additional DTW search based on averaging all the cost matrices given by the three languages is conducted, as in [18].…”

Section: Searchmentioning

confidence: 99%

ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Tejedor

Toledano

López-Otero

et al. 2018

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…The input features for the ASM tokenizer were the same as those for the GMM tokenizer. Combination of these two tokenizers was performed by the DTW matrix combination approach [11]. PRF and score normalization were used as the back-end.…”

Section: Restricted Systemsmentioning

confidence: 99%

“…All these tokenizers were used to generate posteriorgrams, and Dynamic Time Warping (DTW) was applied for detection. To exploit the complementary information of all the tokenizers, a DTW matrix combination approach [11] was used. Pseudo relevance feedback (PRF) and score normalization were used as the back-end.…”

Section: Open Systemsmentioning

confidence: 99%