ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

Shenoy, Ashish; Bodapati, Sravan; Kirchhoff, Katrin

doi:10.18653/v1/2021.ecnlp-1.3

Cited by 7 publications

(4 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proliferation of conversational agents (CAs), also known as chatbots or dialog systems, has been spurred by advancements in Natural Language Processing (NLP) technologies. Their application spans diverse sectors, from education (Okonkwo and Ade-Ibijola, 2021;Durall and Kapros, 2020) to e-commerce (Shenoy et al, 2021), demonstrating their increasing ubiquity and potency.…”

Section: Introductionmentioning

confidence: 99%

An “Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives”

Cho,

Rai,

Ungar

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this gap, we conduct a comprehensive literature review using the PRISMA framework, reviewing 534 papers published in both computer science and medicine. Our systematic review reveals 136 key papers on building mental health-related conversational agents with diverse characteristics of modeling and experimental design techniques. We find that computer science papers focus on LLM techniques and evaluating response quality using automated metrics with little attention to the application while medical papers use rule-based conversational agents and outcome metrics to measure the health outcomes of participants. Based on our findings on transparency, ethics, and cultural heterogeneity in this review, we provide a few recommendations to help bridge the disciplinary divide and enable the cross-disciplinary development of mental health conversational agents.

show abstract

Section: Introductionmentioning

confidence: 99%

An “Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives”

Cho,

Rai,

Ungar

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Maintaining multiple domain-adapted copies of these LMs is not scalable as it involves large memory, compute, and maintenance costs. On the other hand, a common version of such an LM for all *equal contribution the domains falls short in performance than domain-specific LM [7,8]. Therefore, a need for a middle ground between performance and costs is evident.…”

Section: Introductionmentioning

confidence: 99%

“…Recent language modeling literature [9,10,11,12,13,8] includes novel methodologies to solve a related problem of efficiently adapting large LMs to specific tasks. Instead of fine-tuning and storing millions of parameters for each task, they propose ideas that involve using a common task-agnostic copies of the LMs with some limited additional parameters per task.…”

Section: Introductionmentioning

confidence: 99%

Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

Dingliwal¹,

Shenoy²,

Bodapati³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains. Since domain-specific systems perform better than their generic counterparts on in-domain evaluation, the need for memory and compute-efficient domain adaptation is obvious. Particularly, adapting parameter-heavy transformerbased language models used for rescoring ASR hypothesis is challenging. In this work, we introduce domain-prompts, a methodology that trains a small number of domain token embedding parameters to prime a transformer-based LM to a particular domain. With just a handful of extra parameters per domain, we achieve 7-14% WER improvement over the baseline of using an unadapted LM. Despite being parameter-efficient, these improvements are comparable to those of fully-fine-tuned models with hundreds of millions of parameters. With ablations on prompt-sizes, dataset sizes, initializations and domains, we provide evidence for the benefits of using domain-prompts in ASR systems.

show abstract

“…Various methods have been proposed to mitigate this issue, which include using mixture of domain experts [4], context based interpolation weights [5] and second-pass rescoring through domain-adapted models [6] to feature based domain adaptation [7]. In [8,9], user-provided speech patterns were leveraged for on-the-fly adaptation. Yet another way to solve this problem is called ASR error correction.…”

Section: Introductionmentioning

confidence: 99%

Remember the context! ASR slot error correction through memorization

Bekal¹,

Shenoy²,

Sunkara³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems. Although it is a critical step with direct impact on downstream tasks such as language understanding, many domain agnostic ASR systems tend to perform poorly on domain specific or long tail words. They are often supplemented with slot error correcting systems but it is often hard for any neural model to directly output such rare entity words. To address this problem, we propose k-nearest neighbor (k-NN) search that outputs domain-specific entities from an explicit datastore. We improve error correction rate by conveniently augmenting a pretrained joint phoneme and text based transformer sequence to sequence model with k-NN search during inference. We evaluate our proposed approach on five different domains containing long tail slot entities such as full names, airports, street names, cities, states. Our best performing error correction model shows a relative improvement of 7.4% in word error rate (WER) on rare word entities over the baseline and also achieves a relative WER improvement of 9.8% on an out of vocabulary (OOV) test set.

show abstract

ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

Cited by 7 publications

References 28 publications

An “Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives”

An “Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives”

Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

Remember the context! ASR slot error correction through memorization

Contact Info

Product

Resources

About