Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2453
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant

Abstract: Recent interest in intelligent assistants has increased demand for Automatic Speech Recognition (ASR) systems that can utilize contextual information to adapt to the user's preferences or the current device state. For example, a user might be more likely to refer to their favorite songs when giving a "music playing" command or request to watch a movie starring a particular favorite actor when giving a "movie playing" command. Similarly, when a device is in a "music playing" state, a user is more likely to give… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(19 citation statements)
references
References 16 publications
0
16
0
Order By: Relevance
“…While they do not use a KG as their knowledge source, [7], [8], [9] and [10] share many similarities with our work. The models in all of those works can be viewed as log-linear models which consume n-gram features.…”
Section: Related Workmentioning
confidence: 91%
See 2 more Smart Citations
“…While they do not use a KG as their knowledge source, [7], [8], [9] and [10] share many similarities with our work. The models in all of those works can be viewed as log-linear models which consume n-gram features.…”
Section: Related Workmentioning
confidence: 91%
“…The approaches in [7], [8], [9] additionally differ in that the models are applied early in the recognition process rather than during lattice rescoring (an advantage, all else equal.) The approach in [10] is especially similar to our work in that their analog to n-gram features can contain non-terminals. However, in [10], lattices are semantically tagged to determine the locations of the non-terminals before being rescored, and their n-gram features do not capture entity-entity relationships.…”
Section: Related Workmentioning
confidence: 97%
See 1 more Smart Citation
“…As of 2017, the word accuracy of the Google API is estimated at 95% for U.S. English (4.9% WER) [33], which makes it the first speech recognition framework to score below 5%. Alternative tests [34,35] have shown a higher WER (7.4% and 13.5%, respectively).…”
Section: Module Architecturementioning
confidence: 99%
“…Weighted automata are a popular weighted language model in natural language processing. They have found use across the discipline both alone (Mohri et al, 2002) and in conjunction with more complicated language models (Ghazvininejad et al, 2016;Velikovich et al, 2018). As such, finding efficient algorithms for weighted automata has become an intensely studied topic (Allauzen and Mohri, 2009;Argueta and Chiang, 2018).…”
Section: Introductionmentioning
confidence: 99%