2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009
DOI: 10.1109/icassp.2009.4960494
|View full text |Cite
|
Sign up to set email alerts
|

Effect of pronounciations on OOV queries in spoken term detection

Abstract: The spoken term detection (STD) task aims to return relevant segments from a spoken archive that contain the query terms whether or not they are in the system vocabulary. This paper focuses on pronunciation modeling for Out-of-Vocabulary (OOV) terms which frequently occur in STD queries. The STD system described in this paper indexes word-level and sub-word level lattices or confusion networks produced by an LVCSR system using Weighted Finite State Transducers (WFST). We investigate the inclusion of n-best pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0
1

Year Published

2009
2009
2014
2014

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 44 publications
(33 citation statements)
references
References 10 publications
0
32
0
1
Order By: Relevance
“…Spoken Term Detection (STD), defined by NIST as searching vast, heterogeneous audio archives for occurrences of spoken terms (NIST, 2006), is a fundamental building block of such systems (Mamou and Ramabhadran, 2008;Can et al, 2009;Vergyri et al, 2007;Akbacak et al, 2008;Szöke et al, 2008b,a;Thambiratmann and Sridharan, 2007;Wallace et al, 2010;Jansen et al, 2010;Parada et al, 2010;Chan and Lee, 2010;Chen et al, 2010;Motlicek et al, 2010), and its development has been strongly influenced by NIST STD evaluations (NIST, 2006(NIST, , 2013.…”
Section: Spoken Term Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Spoken Term Detection (STD), defined by NIST as searching vast, heterogeneous audio archives for occurrences of spoken terms (NIST, 2006), is a fundamental building block of such systems (Mamou and Ramabhadran, 2008;Can et al, 2009;Vergyri et al, 2007;Akbacak et al, 2008;Szöke et al, 2008b,a;Thambiratmann and Sridharan, 2007;Wallace et al, 2010;Jansen et al, 2010;Parada et al, 2010;Chan and Lee, 2010;Chen et al, 2010;Motlicek et al, 2010), and its development has been strongly influenced by NIST STD evaluations (NIST, 2006(NIST, , 2013.…”
Section: Spoken Term Detectionmentioning
confidence: 99%
“…However, as noted by Logan et al (2000), about 12% of users' queries typically contain out-of-vocabulary (OOV) words, which will never be found in the word lattices, because they do not appear in the LVCSR system vocabulary. Common approaches to solve this problem usually involve producing sub-word (typically phone/phoneme) lattices with the ASR subsystem, and then searching for sub-word representations of the enquiry terms (Saraçlar and Sproat, 2004;Mamou et al, 2007;Can et al, 2009;Szöke et al, 2006;Wallace et al, 2007;Parlak and Saraçlar, 2008). Other sub-word units are possible, such as syllables (Meng et al, 2007), graphemes Tejedor et al, 2008) or multi-grams (Pinto et al, 2008;Szöke et al, 2008a).…”
Section: Spoken Term Detectionmentioning
confidence: 99%
“…Soft match is the most common technique for mitigating acoustic variation; it allows for some mismatch between the pronunciation predicted for the search term and the phoneme sequences in the lattice and typically involves a penalty based on either edit distance [13], [36], [37], acoustic confusion [21], [22], [24], [38] or model distance [39], [40]. Lexical deviation, however, has not been widely investigated until recently [3], [25].…”
Section: A Oov Uncertaintymentioning
confidence: 99%
“…The most common approach to STD is the use of a large vocabulary continuous speech recognition (LVCSR) system to obtain word/subword/phonetic lattices that are subsequently indexed [1,2,3]. There are many challenges in finding a good operation point for a Spoken Term Detection (STD) system that balances false alarms and true hits, particularly when the queries are Out of Vocabulary (OOV) terms for the LVCSR system.…”
Section: Motivationmentioning
confidence: 99%
“…There are many challenges in finding a good operation point for a Spoken Term Detection (STD) system that balances false alarms and true hits, particularly when the queries are Out of Vocabulary (OOV) terms for the LVCSR system. In [2], we presented a Weighted Finite State Transducer (WFST) based indexing system modeled along the lines of [3,4], which allows us to use the lattice representation of the audio directly as a query to the search system. This enabled us to compare the performance of the STD system when presented with textual queries and queries represented by sample audio from an existing index and conclude that a two-pass approach that uses the hits from text-based queries to refine search results can enhance the performance of a STD system.…”
Section: Motivationmentioning
confidence: 99%