Language independent query-by-example spoken term detection using N-best phone sequences and partial matching

Xu, Haihua; Yang, Peng; Xiao, Xiong; Xie, Lei; Leung, Cheung-Chi; Chen, Hongjie; Yu, Jia; Lv, Hang; Wang, Lei; Leow, Su Jun; Ma, Bin; Chng, Eng Siong; Li, Haizhou

doi:10.1109/icassp.2015.7178961

Cited by 14 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While our LVCSR-KWS work in Tamil and Vietnamese [24] focus on text queries, we have inspired strategies used in spoken term detection of audio queries. For example, [31] proposed partialmatching symbolic search, which complements popular pattern matching approaches using dynamic time warping in Query-by-Example Search on Speech (QUESST), formerly called Spoken Web Search (SWS), in MediaEval 2014.…”

Section: Discussionmentioning

confidence: 99%

Low-resource keyword search strategies for tamil

Chen

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.

show abstract

Section: Discussionmentioning

confidence: 99%

Low-resource keyword search strategies for tamil

Chen

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the 2014 Queryby-Example Speech Search Task (QUESST) [11], one task was non-exact matching, in which test occurrences could contain small morphological variations with regard to the lexical form of the query. To solve this problem, Xu et al [12] proposed a partial matching strategy in which all partial phone sequences of a query were used to search for matching instances; Proenga et al [13] some of these works attempted to solve the non-exact matching problem, they all used DTW-based matching on frame-level representations, which has been shown to be outperformed by distance-based matching on acoustic word embeddings [2,3,4].…”

Section: Related Workmentioning

confidence: 99%

Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages

Yang

Hirschberg

2019

Interspeech 2019

View full text Add to dashboard Cite

Acoustic word embeddings have been proven to be useful in query-by-example keyword search. Such embeddings are typically trained to distinguish the same word from a different word using exact orthographic representations; so, two different words will have dissimilar embeddings even if they are pronounced similarly or share the same stem. However, in real-world applications such as keyword search in low-resource languages, models are expected to find all derived and inflected forms for a certain keyword. In this paper, we address this mismatch by incorporating linguistic information when training neural acoustic word embeddings. We propose two linguistically-informed methods for training these embeddings, both of which, when we use metrics that consider non-exact matches, outperform state-of-the-art models on the Switchboard dataset. We also present results on Sinhala to show that models trained on English can be directly transferred to embed spoken words in a very different language with high accuracy.

show abstract

“…Our partial matching DTW systems, including fixedwindow [8,16] and phoneme-sequence [17] partial matching systems, were used to deal with T2 and T3 queries. In each fixed-window partial matching system, an analysis window between 70 and 90 frames long was defined.…”

Section: Dtw Systemsmentioning

confidence: 99%

“…Weighted finite state transducer (WFST) based symbolic search systems were used to deal with T2 and T3 queries [8,16]. Such systems decoded a query utterance into N-best phone sequences, and the partial phone sequences were extracted and converted to WFST format.…”

Section: Wfst-based Symbolic Search Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis

Leung

Wang

et al. 2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

This paper documents the significant components of a state-ofthe-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our postevaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.

show abstract

Language independent query-by-example spoken term detection using N-best phone sequences and partial matching

Cited by 14 publications

References 22 publications

Low-resource keyword search strategies for tamil

Low-resource keyword search strategies for tamil

Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis

Contact Info

Product

Resources

About