A keyword search system using open source software

Trmal, Jan; Chen, Guoguo; Povey, Dan; Khudanpur, Sanjeev; Ghahremani, Pegah; Zhang, Xiaohui; Manohar, Vimal; Liu, Chunxi; Jansen, Aren; Klakow, Dietrich; Yarowsky, David; Metze, Florian

doi:10.1109/slt.2014.7078630

Cited by 37 publications

(20 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In addition to the lattice-level fusion, we performed fusion on the list-level, described in [33], using Kaldi [26] for all AMs for each approach independently and for the both approaches together ( Table 2). The list-level combination of all the systems for both approaches provides an additional improvement in overall accuracy (MTWV=0.795), which corresponds to 7.4% of relative MTWV improvement over the best fusion result.…”

Section: Resultsmentioning

confidence: 99%

Fast and Accurate OOV Decoder on High-Level Features

Khokhlov¹,

Tomashenko²,

Medennikov³

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

This work proposes a novel approach to out-of-vocabulary (OOV) keyword search (KWS) task. The proposed approach is based on using high-level features from an automatic speech recognition (ASR) system, so called phoneme posterior based (PPB) features, for decoding. These features are obtained by calculating time-dependent phoneme posterior probabilities from word lattices, followed by their smoothing. For the PPB features we developed a special novel very fast, simple and efficient OOV decoder. Experimental results are presented on the Georgian language from the IARPA Babel Program, which was the test language in the OpenKWS 2016 evaluation campaign. The results show that in terms of maximum term weighted value (MTWV) metric and computational speed, for single ASR systems, the proposed approach significantly outperforms the state-of-the-art approach based on using in-vocabulary proxies for OOV keywords in the indexed database. The comparison of the two OOV KWS approaches on the fusion results of the nine different ASR systems demonstrates that the proposed OOV decoder outperforms the proxy-based approach in terms of MTWV metric given the comparable processing speed. Other important advantages of the OOV decoder include extremely low memory consumption and simplicity of its implementation and parameter optimization.

show abstract

Section: Resultsmentioning

confidence: 99%

Fast and Accurate OOV Decoder on High-Level Features

Khokhlov¹,

Tomashenko²,

Medennikov³

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

show abstract

“…ASR would be a hugely beneficial addition to LENAs automatic segmentation and diarization, and other long-form audio recordings. For example, an ability to detect a specified set of keywords could be very useful for scientists interested in how natural audio environments supports word learning, which should be an achievable goal [30,38].…”

Section: Case Study: the Homebank Repositorymentioning

confidence: 99%

Virtual Machines and Containers as a Platform for Experimentation

et al. 2016

Self Cite

View full text Add to dashboard Cite

Research on computational speech processing has traditionally relied on the availability of a relatively large and complex infrastructure, which encompasses data (text and audio), tools (feature extraction, model training, scoring, possibly on-line and off-line, etc.), glue code, and computing. Traditionally, it has been very hard to move experiments from one site to another, and to replicate experiments. With the increasing availability of shared platforms such as commercial cloud computing platforms or publicly funded super-computing centers, there is a need and an opportunity to abstract the experimental environment from the hardware, and distribute complete setups as a virtual machine, a container, or some other shareable resource, that can be deployed and worked with anywhere.In this paper, we discuss our experience with this concept and present some tools that the community might find useful. We outline, as a case study, how such tools can be applied to a naturalistic language acquisition audio corpus.

show abstract

“…Trmal et al [94] proposed system combination from different ASR systems that employ different configurations in terms of acoustic features and acoustic models (e.g., subspace GMMs (SGMMs), DNNs, and bottle-neck features). Kaldi STD system [68][69][70] was used for term detection in all the systems.…”

Section: Spoken Term Detection Under the Iarpa Babel Program And Openmentioning

confidence: 99%

“…Best performance in the NIST Open KWS 2013 evaluation is ATWV=0.6248 [110] under the Full Language Pack (FullLP) condition, for which 20 h of word-transcribed scripted speech, 80 h of word-transcribed CTS, and a pronunciation lexicon were given to participants. In the works describing systems on the surprise language (i.e., Tamil) of the Open KWS 2014 evaluation [53,92,94,[111][112][113][114][115][116][117], ATWV=0.5802 is the best performance obtained under the FullLP condition, for which 60 h of transcribed speech and a pronunciation lexicon were given to participants.…”

Section: Comparison To Other Evaluationsmentioning

confidence: 99%

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Tejedor

Toledano

López-Otero

et al. 2015

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).

show abstract

A keyword search system using open source software

Cited by 37 publications

References 12 publications

Fast and Accurate OOV Decoder on High-Level Features

Fast and Accurate OOV Decoder on High-Level Features

Virtual Machines and Containers as a Platform for Experimentation

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Contact Info

Product

Resources

About