Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1259
|View full text |Cite
|
Sign up to set email alerts
|

Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term Detection

Abstract: In spoken term detection (STD) systems, automatic speech recognition (ASR) frontend is often employed for its reasonable accuracy and efficiency. However, out-of-vocabulary (OOV) problem at ASR stage has a great impact on the STD performance for spoken query. In this paper, we propose combining feature-based acoustic match which is often employed in the STD systems for low resource languages, along with the other ASR-derived features. First, automatic transcripts for spoken document and spoken query are decomp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…Logistic regression-based fusion on DTW and phone-based systems is employed in [71][72][73][74]. DTWbased search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring are employed in [75], and [76] adds a logistic regression-based approach for detection rescoring. Finally, [77] employs a syllable-based speech recognizer and dynamic programming at the triphone state level to output detections and DNN posteriorgram-based rescoring.…”
Section: Hybrid Methodsmentioning
confidence: 99%
“…Logistic regression-based fusion on DTW and phone-based systems is employed in [71][72][73][74]. DTWbased search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring are employed in [75], and [76] adds a logistic regression-based approach for detection rescoring. Finally, [77] employs a syllable-based speech recognizer and dynamic programming at the triphone state level to output detections and DNN posteriorgram-based rescoring.…”
Section: Hybrid Methodsmentioning
confidence: 99%
“…[38][39][40][41] use a logistic regression-based fusion on DTW-and phone-based systems. Oishi et al [42] uses a DTW-based search at the HMM state-level from syllables obtained from a word-based speech recognizer and a deep neural network (DNN) posteriorgram-based rescoring, and [43] adds a logistic regression-based approach for detection rescoring. Obara et al [44] employs a syllablebased speech recognizer and dynamic programming at the triphone-state level to output detections and DNN posteriorgram-based rescoring.…”
Section: Hybrid Approachmentioning
confidence: 99%
“…Query-by-example Spoken Term Detection aims to retrieve data from a speech repository (henceforth utterance) given an acoustic query containing the term of interest as input. QbE STD has been mainly addressed from three different approaches: methods based on the word/subword transcription of the query that typically employ a word/phone-based speech recognition system for query detection [12,13], methods based on template matching of features that are typically based on posteriorgram-based units and DTW-like search for query detections [14,15,16,17], and hybrid approaches that take advantage of both approaches [18,19,20,21].…”
Section: Introductionmentioning
confidence: 99%