Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.248
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Dialogue State Tracking

Abstract: Designed for tracking user goals in dialogues, a dialogue state tracker is an essential component in a dialogue system. However, the research of dialogue state tracking has largely been limited to unimodality, in which slots and slot values are limited by knowledge domains (e.g. restaurant domain with slots of restaurant name and price range) and are defined by specific database schema. In this paper, we propose to extend the definition of dialogue state tracking to multimodality. Specifically, we introduce a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
0
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 37 publications
0
0
0
Order By: Relevance
“…Depending on how to model the context of utterances, existing MERC methods are categorized into three classes: Recurrent-based methods Mao et al, 2022) adopt RNN or LSTM to model the sequential context for each utterance. Transformers-based methods (Ling et al, 2022;Liang et al, 2022;Le et al, 2022) use Transformers with cross-modal attention to model the intra-and inter-speaker dependencies. Graphbased methods (Joshi et al, 2022;Fu et al, 2021) can control context information for each utterance and provide accurate intraand inter-speaker dependencies, achieving SOTA performance on many MERC benchmark datasets.…”
Section: Multimodal Emotion Recognitionmentioning
confidence: 99%
“…Depending on how to model the context of utterances, existing MERC methods are categorized into three classes: Recurrent-based methods Mao et al, 2022) adopt RNN or LSTM to model the sequential context for each utterance. Transformers-based methods (Ling et al, 2022;Liang et al, 2022;Le et al, 2022) use Transformers with cross-modal attention to model the intra-and inter-speaker dependencies. Graphbased methods (Joshi et al, 2022;Fu et al, 2021) can control context information for each utterance and provide accurate intraand inter-speaker dependencies, achieving SOTA performance on many MERC benchmark datasets.…”
Section: Multimodal Emotion Recognitionmentioning
confidence: 99%
“…Another line of research (Ghazvininejad et al, 2019;Gu et al, 2019) tended to achieve a trade-off between performance and inference efficiency by using semi-NAT architecture. Besides, many researchers had made attempts to introduce NAT into different sequence generation tasks like speech recognition (Ren et al, 2019) and dialog state tracking (Le et al, 2020). Moreover, it is worth noting that many NAT related pre-trained generation models Qi et al, 2020) had been proposed to improve the generation results of pre-trained models.…”
Section: Non-autoregressive Transformermentioning
confidence: 99%
“…Firstly, the natural language understanding (NLU) module (Abro et al, 2022) converts user requests into semantic slots, domain information, and user intention. Secondly, the dialogue state tracking (DST) module (Wu et al, 2019;Le et al, 2019;Lin et al, 2021;Heck et al, 2023) extracts the dialogue state, which records user requests in the form of slot-value pairs. The dialogue policy learning (POL) module (Chen et al, 2017;Geishauser et al, 2022) determines the next action of the dialogue agent based on the dialogue state.…”
Section: Related Workmentioning
confidence: 99%