Exploiting diversity for spoken term detection

Mangu, Lidia; Soltau, Hagen; Kuo, Hong-Kwang Jeff; Kingsbury, Brian; Saon, George

doi:10.1109/icassp.2013.6639280

Cited by 43 publications

(46 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this work was performed on large-resource languages and most of the effort focused on clean speech. It has recently been demonstrated that significant improvement on STD task can be obtained by deliberately designing diverse and complementary ASR components (i.e., front ends, acoustic models, etc) [2]. We show that similar approach works on noisy speech for lowresource languages with low target false alarm rate.…”

Section: Introductionmentioning

confidence: 89%

“…This normalization scheme was proposed for IR data fusion in [9] and showed improvement for meta-search. It was used successfully for the first time in STD in [2]. A variant of scheme was initially investigated for IR in [10].…”

Section: Score Normalization Methodologiesmentioning

confidence: 99%

“…The work presented here shows that the combination strategy is useful on a very different task with very different challenges. Recently, it has been demonstrated, as part of the DARPA RATS program, that good KWS performance on STD can be obtained by combining ASR systems [2]. However, in the RATS task, the main challenges are severe noise and channel distortion, while in the Babel task, the main challenges are speaker variability and severely limited LM training data.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

System combination and score normalization for spoken term detection

Mamou

Cui

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system.

show abstract

Section: Introductionmentioning

confidence: 89%

Section: Score Normalization Methodologiesmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

System combination and score normalization for spoken term detection

Mamou

Cui

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Before MTWV scoring these values were further normalised using a sum-to-one approach which ensures that the sum over the test set of the the scores for each keyword sum to unity. More details of the approach are given in [29].…”

Section: Abstractearch Systemmentioning

confidence: 99%

Investigation of multilingual deep neural networks for spoken term detection

Knill

Gales

Rath

et al. 2013

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

View full text Add to dashboard Cite

The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speechto-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (∼10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.

show abstract

“…The limited data corresponding to some languages covered in the program (Cantonese, Pashto, Turkish, Tagalog, Vietnamese, Assamese, Bengali, Haitian Creole, Lao, and Zulu) were used for system training. The system is based on multi-lingual bottle-neck DNNs and Hidden Markov Model Toolkit (HTK) [83] for training and decoding and the IBM keyword search system for term detection [84]. Results showed that INV term performance is good for languages (e.g., Haitian Creole) whose phonetic structure is similar to that of the languages used for system training.…”

Section: Spoken Term Detection Under the Iarpa Babel Program And Openmentioning

confidence: 99%

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Tejedor

Toledano

López-Otero

et al. 2015

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).

show abstract

Exploiting diversity for spoken term detection

Cited by 43 publications

References 15 publications

System combination and score normalization for spoken term detection

System combination and score normalization for spoken term detection

Investigation of multilingual deep neural networks for spoken term detection

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Contact Info

Product

Resources

About