Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-691
|View full text |Cite
|
Sign up to set email alerts
|

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis

Abstract: This paper documents the significant components of a state-ofthe-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our post… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
5

Relationship

1
9

Authors

Journals

citations
Cited by 20 publications
(18 citation statements)
references
References 26 publications
0
18
0
Order By: Relevance
“…An information retrieval technique to hypothesize detection and DTW-based score detection are proposed in [39]. Logistic regression-based fusion on DTW and phone-based systems is employed in [71][72][73][74]. DTWbased search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring are employed in [75], and [76] adds a logistic regression-based approach for detection rescoring.…”
Section: Hybrid Methodsmentioning
confidence: 99%
“…An information retrieval technique to hypothesize detection and DTW-based score detection are proposed in [39]. Logistic regression-based fusion on DTW and phone-based systems is employed in [71][72][73][74]. DTWbased search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring are employed in [75], and [76] adds a logistic regression-based approach for detection rescoring.…”
Section: Hybrid Methodsmentioning
confidence: 99%
“…[35][36][37] propose a logistic regression-based fusion of acoustic keyword spotting and DTW-based systems using language-dependent phoneme recognizers. [38][39][40][41] use a logistic regression-based fusion on DTW-and phone-based systems. Oishi et al [42] uses a DTW-based search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring, and [43] adds a logistic regression-based approach for detection rescoring.…”
Section: Hybrid Approachmentioning
confidence: 99%
“…Retrieving spoken content with spoken queries, also known as queryby-example spoken term detection (STD) [1][2][3][4][5][6], is attractive because hand-held or wearable devices make spoken queries a natural choice. The most intuitive way to search over spoken content for a spoken query is to directly match the audio signals to find those audio snippets that sound like the spoken query, and dynamic time warping (DTW) [7] is widely used.…”
Section: Introductionmentioning
confidence: 99%