Interspeech 2007 2007
DOI: 10.21437/interspeech.2007-174
|View full text |Cite
|
Sign up to set email alerts
|

Rapid and accurate spoken term detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 153 publications
(22 citation statements)
references
References 3 publications
0
18
0
Order By: Relevance
“…Many previous works consider the problem of developing "offline" (i.e., non-streaming) KWS technologies. In this setting, the dominant paradigm consists of recognizing the entire speech corpus using a large vocabulary continuous speech recognizer (LVCSR) to build word or sub-word lattices, which can then be indexed to perform efficient search, e.g., [1,2,3].…”
Section: Introductionmentioning
confidence: 99%
“…Many previous works consider the problem of developing "offline" (i.e., non-streaming) KWS technologies. In this setting, the dominant paradigm consists of recognizing the entire speech corpus using a large vocabulary continuous speech recognizer (LVCSR) to build word or sub-word lattices, which can then be indexed to perform efficient search, e.g., [1,2,3].…”
Section: Introductionmentioning
confidence: 99%
“…The term detector was implemented with Lat-tice2Multigram provided by the Speech Processing Group, FIT, Brno University of Technology. Word-dependent thresholds were applied to improve decision quality [2], [13]. STD performance is reported in terms of ATWV [1]; detection (DET) curves are used to show behaviour at different hit/FA ratios.…”
Section: Methodsmentioning
confidence: 99%
“…A typical STD system comprises an ASR subsystem for lattice generation and a STD subsystem for term detection, as illustrated in Figure 1. Some state-of-the-art STD systems include those reported in [2]- [7].…”
Section: Introductionmentioning
confidence: 99%
“…The proposed approach is similar to previous work on efficient rescoring with neural LMs [20] and to generating lattices in attention-based encoder-decoder models [21]. The ability to produce rich lattices from sequence-to-sequence models has many potential applications: e.g., they can be used for spoken term detection [22]; as inputs to spoken language understanding systems [23]; or for computing word-level posteriors for word-or utterance-level confidence estimation [24]. In experimental evaluations, we find that the models require 5-gram contexts (i.e., conditioning on the four previously predicted labels) in order to obtain comparable WER results as the baseline.…”
Section: Introductionmentioning
confidence: 91%