Personalizing Natural Language Understanding using Multi-armed Bandits and Implicit Feedback

Moerchen, Fabian; Erñst, Patrick; Zappella, Giovanni

doi:10.1145/3340531.3412736

Cited by 7 publications

(2 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first is the Explore-Then-Commit Algorithm. The core idea of ETC algorithm is to explore by playing each arm a fixed number of times and then exploit by committing to the arm that appeared the best during exploration, which is the same idea as AB testing [7]. During the exploration phase, the learner chooses each arm in a round-robin fashion until all 𝑘 arms are selected 𝑚 times each.…”

Section: Explore-then-commit Algorithmmentioning

confidence: 99%

Applying Multi-Armed Bandit algorithms for music recommendations at Spotify

Xia

2024

ACE

View full text Add to dashboard Cite

This study explores the application of multi-armed bandit algorithms in enhancing music recommendation systems, with a focus on Spotify. It delves into the Explore-Then-Commit (ETC), Upper Confidence Bound (UCB), and Thompson Sampling (TS) algorithms, evaluating their efficacy within the Spotify context. The primary objective is to determine which algorithm optimally balances exploration and exploitation to maximize user satisfaction and engagement. The research reveals that the ETC algorithm, with its rigid exploration and exploitation phases, incurs a notably higher regret value. This rigidity can lead to missed opportunities in identifying optimal choices and hinder adaptability to evolving user preferences. Conversely, the UCB and TS algorithms exhibit remarkable adaptability and a flexible balance between exploration and exploitation. This flexibility translates into more personalized and satisfactory user experiences in music recommendations. However, the selection of the most appropriate algorithm should be contingent on the size and characteristics of the specific user dataset, as well as the fine-tuning of algorithm parameters to align with user preferences and behaviors.

show abstract

Section: Explore-then-commit Algorithmmentioning

confidence: 99%

Applying Multi-Armed Bandit algorithms for music recommendations at Spotify

Xia

2024

ACE

View full text Add to dashboard Cite

show abstract

“…Falke et al (2020) leverage user paraphrasing behavior in dialog systems to automatically collect annotations for long-tail utterances. Moerchen et al (2020) present an approach where implicit negative feedback from the user is used to train a re-ranker that is then applied to pick correct annotations for under-performing utterances.…”

Section: Related Workmentioning

confidence: 99%

Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection

Schroedl¹,

Kumar²,

Hajebi³

et al. 2022

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

View full text Add to dashboard Cite

Natural language understanding (NLU) models are a core component of large-scale conversational assistants. Collecting training data for these models through manual annotations is slow and expensive that impedes the pace of model improvement. We present a three stage approach to address this challenge: First, we identify a large set of relatively infrequent utterances from live traffic where the users implicitly communicated satisfaction with a response (such as by not interrupting), along with the existing model outputs as candidate annotations. Second, we identify a small subset of these utterances usings Integrated Gradients based importance scores computed with the current models. Finally, we augment our training sets with these utterances and retrain our models. We demonstrate the effectiveness of our approach in a large-scale conversational assistant, processing billions of utterances every week. By augmenting our training set with just 0.05% more utterances through our approach, we observe statistically significant improvements for infrequent tail utterances: a 0.45% reduction in semantic error rate (Se-mER) in offline experiments, and a 1.23% reduction in defect rates in online A/B tests.

show abstract

Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook

Lin

2024

Expert Systems with Applications

View full text Add to dashboard Cite

Personalizing Natural Language Understanding using Multi-armed Bandits and Implicit Feedback

Cited by 7 publications

References 21 publications

Applying Multi-Armed Bandit algorithms for music recommendations at Spotify

Applying Multi-Armed Bandit algorithms for music recommendations at Spotify

Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection

Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook

Contact Info

Product

Resources

About