Recurrent neural network language models for keyword search

Chen, X.; Ragni, Anton; Vasilakes, Jake; Liu, X.; Knill, Kate; Gales, Mark J. F.

doi:10.1109/icassp.2017.7953263

Cited by 8 publications

(9 citation statements)

References 20 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lattice rescoring is impractical for bi-RNNLMs as the word probability calculations require information from the complete sentence. However, lattices are very important in a range of downstream applications, including confidence score estimation [21], keyword search [22] and confusion network decoding [23].…”

Section: Bi-directional Rnnlmsmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition

Chen

Liu

Wang

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Language modelling is a crucial component in a wide range of applications including speech recognition. Language models (LMs) are usually constructed by splitting a sentence into words and computing the probability of a word based on its word history. This sentence probability calculation, making use of conditional probability distributions, assumes that there is little impact from approximations used in the LMs including: the word history representations; and approaches to handle finite training data. This motivates examining models that make use of additional information from the sentence. In this work future word information, in addition to the history, is used to predict the probability of the current word. For recurrent neural network LMs (RNNLMs) this information can be encapsulated in a bi-directional model. However, if used directly this form of model is computationally expensive when training on large quantities of data, and can be problematic when used with word lattices. This paper proposes a novel neural network language model structure, the succeeding-word RNNLM, su-RNNLM, to address these issues. Instead of using a recurrent unit to capture the complete future word contexts, a feed-forward unit is used to model a fixed finite number of succeeding words. This is more efficient in training than bi-directional models and can be applied to lattice rescoring. The generated lattices can be used for downstream applications, such as confusion network decoding and keyword search. Experimental results on speech recognition and keyword spotting tasks illustrate the empirical usefulness of future word information, and the flexibility of the proposed model to represent this information.

show abstract

Section: Bi-directional Rnnlmsmentioning

confidence: 99%

“…In these experiments the performance of su-RNNLMs, which can be directly applied to lattices is compared to uni-RNNLMs. In [41] uni-RNNLMs were demonstrated to be effective for KWS. A total about 50 hours of transcribed conversational telephone speech data are provided to build the ASR and keyword search systems.…”

Section: Experiments On Abstractearchmentioning

confidence: 99%

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition

Chen

Liu

Wang

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…On the other hand, the subword-based approach has the unique advantage that it can detect terms that consist of words that are not in the vocabulary of the recognizer, i.e., out-ofvocabulary (OOV) terms. The combination of these two approaches has been proposed in order to exploit the relative advantages of word and subword-based strategies [17,32,33,36,44,[63][64][65][66][67][68][69][70].…”

Section: Spoken Term Detection Overviewmentioning

confidence: 99%

ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

Tejedor

Toledano

López-Otero

et al. 2019

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a speech repository given a textual representation of a search term (which can include one or more words). This paper presents a multi-domain internationally open evaluation for STD in Spanish. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation task aims at retrieving the speech files that contain the terms, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: the MAVIR database, which comprises a set of talks from workshops; the RTVE database, which includes broadcast news programs; and the COREMAH database, which contains 2-people spontaneous speech conversations about different topics. We present the evaluation itself, the three databases, the evaluation metric, the systems submitted to the evaluation, the results, and detailed post-evaluation analyses based on some term properties (within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native/foreign terms). Fusion results of the primary systems submitted to the evaluation are also presented. Three different research groups took part in the evaluation, and 11 different systems were submitted. The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain.

show abstract

“…It is not practical for bi-RNNLMs to be used for lattice rescoring and generation as both the complete previous and future context information are required. However, lattices are very useful in many applications, such as confidence score estimation [9], keyword search [10] and confusion network decoding [11]. In contrast, su-RNNLMs require a fixed number of succeeding words, instead of the complete future context information.…”

Section: Lattice Rescoringmentioning

confidence: 99%

“…However, the ability to manipulate lattices is very important in many speech applications. Lattices can be used for a wide range of downstream applications, such as confidence score estimation [9], keyword search [10] and confusion network decoding [11]. In order to address these issues, a novel model structure, succeeding word RNNLMs (su-RNNLMs), is proposed in this paper.…”

Section: Introductionmentioning

confidence: 99%

Future word contexts in neural network language models

Chen

Liu

Ragni

et al. 2017

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Self Cite

View full text Add to dashboard Cite

Recently, bidirectional recurrent network language models (bi-RNNLMs) have been shown to outperform standard, unidirectional, recurrent neural network language models (uni-RNNLMs) on a range of speech recognition tasks. This indicates that future word context information beyond the word history can be useful. However, bi-RNNLMs pose a number of challenges as they make use of the complete previous and future word context information. This impacts both training efficiency and their use within a lattice rescoring framework. In this paper these issues are addressed by proposing a novel neural network structure, succeeding word RNNLMs (su-RNNLMs). Instead of using a recurrent unit to capture the complete future word contexts, a feedforward unit is used to model a finite number of succeeding, future, words. This model can be trained much more efficiently than bi-RNNLMs and can also be used for lattice rescoring. Experimental results on a meeting transcription task (AMI) show the proposed model consistently outperformed uni-RNNLMs and yield only a slight degradation compared to bi-RNNLMs in N-best rescoring. Additionally, performance improvements can be obtained using lattice rescoring and subsequent confusion network decoding.

show abstract

Recurrent neural network language models for keyword search

Cited by 8 publications

References 20 publications

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition

ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

Future word contexts in neural network language models

Contact Info

Product

Resources

About