Geographic Language Models for Automatic Speech Recognition

Xiao, Xiaoqiang; Chen, Hong; Zylak, Mark; Sosa, Daniela; Desu, Suma; Krishnamoorthy, Mahesh; Liu, Daben; Paulik, Matthias; Zhang, Yuchen

doi:10.1109/icassp.2018.8462550

Cited by 3 publications

(4 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An efficient way to improve speech recognition accuracy of POI names is to utilize geo-location dependent LMs [8,9,10,11]. For each user, Sten et al [9] trains a Geo-LM dynamically using nearby POI names and combines the Geo-LM with a baseline LM before or at decoding.…”

Section: Related Workmentioning

confidence: 99%

“…For each user, Sten et al [9] trains a Geo-LM dynamically using nearby POI names and combines the Geo-LM with a baseline LM before or at decoding. In [11], a class-based Geo-LM is constructed dynamically for each user depending on users' geographic location, within a difference-LM based weighted finite state transducer (WFST) system. All above approaches construct LMs or WFSTs on-the-fly according to users' geographical locations, which is time consuming and hard to incorporate plenty of POI names into a Geo-LM.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Improving Speech Recognition Accuracy of Local POI Using Geographical Models

Cao

Zhang

Feng

et al. 2021

2021 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI names still remains a challenge due to multi-dialect and long-tailed distribution of POI names. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The proposed Geo-AM deals with multi-dialect problem using dialect-specific input feature and dialectspecific top layers. Secondly, a group of geo-specific language models (Geo-LMs) are integrated into our speech recognition system to improve recognition accuracy of longtailed and homophone POI names. During decoding, a specific Geo-LM is selected on-demand according to the user's geographic location. Experiments show that the proposed Geo-AM achieves 6.5%∼10.1% relative character error rate (CER) reduction on an accent test set and the proposed Geo-AM and Geo-LMs totally achieve over 18.7% relative CER reduction on a voice search task for Tencent Map.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Improving Speech Recognition Accuracy of Local POI Using Geographical Models

Cao

Zhang

Feng

et al. 2021

2021 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

show abstract

“…More specifically, knowledge about the user's interestswhich may eventually lead to an interaction with a specific intentcan be helpful to improve user experience. Xiao et al [39] improve ASR by bucketing users according to their coarse geographic location and enable region-specific query LMs during the ASR decoding process. By personalizing the ASR query model based on user location, they show a significant improvement in the accurate recognition of spoken point-of-interest queries.…”

Section: Personalizationmentioning

confidence: 99%

“…3.1.1 Improving the ASR decoding process. At recognition time, contextual signals, such as partial recognition hypotheses [25] or the user location [39], can be used to modify the search space. Pusateri et al [25] combine multiple domain-specific expert n-gram LMs into a single LM by weighing the expert LMs based on the confidence expressed by each expert LM on how well they support specific left spoken contexts.…”

Section: Open Problems and Challenges 31 Use Of Query Domain Classifi...mentioning

confidence: 99%

Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants

Gysel

Tsagkias

Pusateri

et al. 2020

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities in spoken queries. We introduce a method that uses historical user interactions to forecast which entities will gain in popularity and become trending, and it subsequently integrates the predictions within the Automated Speech Recognition (ASR) component of the VA. Experiments show that our proposed approach results in a 20% relative reduction in errors on emerging entity name utterances without degrading the overall recognition quality of the system.

show abstract