ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053781
|View full text |Cite
|
Sign up to set email alerts
|

Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks

Abstract: Voice-triggered smart assistants often rely on detection of a triggerphrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using graph neural networks (GNN). The proposed approach uses the fact that decoding lattice of a falsely triggered audio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 13 publications
(21 reference statements)
0
9
0
Order By: Relevance
“…Lattice embeddings are obtained by treating the lattice as a graph and processing it using multiple hidden layers of multi-headed self-attention operation. These embeddings have been shown to be highly informative for FTM task [12,1], but they can be obtained only by running full-fledged ASR on the audio which is expensive to be run on-device and invades user privacy in case of a false trigger. Moreover, the LatticeGNN model needs to be retrained if the distribution of the input lattice features changes due to any changes in the acoustic model, language model or the ASR decoding parameters.…”
Section: Latticegnn Ftm and Lattice Embeddingsmentioning
confidence: 99%
See 2 more Smart Citations
“…Lattice embeddings are obtained by treating the lattice as a graph and processing it using multiple hidden layers of multi-headed self-attention operation. These embeddings have been shown to be highly informative for FTM task [12,1], but they can be obtained only by running full-fledged ASR on the audio which is expensive to be run on-device and invades user privacy in case of a false trigger. Moreover, the LatticeGNN model needs to be retrained if the distribution of the input lattice features changes due to any changes in the acoustic model, language model or the ASR decoding parameters.…”
Section: Latticegnn Ftm and Lattice Embeddingsmentioning
confidence: 99%
“…Other prior approaches for device-directed utterance detection includes various trigger-phrase detection techniques explored in [7,8,9,10,11]. Lattice-based techniques which complement trigger-phrase detection systems have been explored in [12,6,1].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…All our experiments are performed on an FTM dataset [8], which is composed of far field usage samples with manual labels of "true trigger" (TT) and "false trigger" (FT) classes. The raw audio data are split into train, cv, dev, and eval sets for the purposes of training, cross-validation, development and evaluation.…”
Section: Ftm Dataset and Evaluation Metricsmentioning
confidence: 99%
“…Thus a classifier built on top of the Bi-LRNN is able to mitigate the false trigger cases significantly. A recent work [8] explored the use of graph neural networks (GNN) to encode the decoding lattice, which achieves similar accuracy as the Bi-LRNN representation with more efficient training.…”
Section: Introductionmentioning
confidence: 99%