Multimodal Dialogue State Tracking

Lê, Hung; Chen, Nancy F.; Hoi, Steven C. H.

doi:10.18653/v1/2022.naacl-main.248

Cited by 9 publications

(8 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Depending on how to model the context of utterances, existing MERC methods are categorized into three classes: Recurrent-based methods Mao et al, 2022) adopt RNN or LSTM to model the sequential context for each utterance. Transformers-based methods (Ling et al, 2022;Liang et al, 2022;Le et al, 2022) use Transformers with cross-modal attention to model the intra-and inter-speaker dependencies. Graphbased methods (Joshi et al, 2022;Fu et al, 2021) can control context information for each utterance and provide accurate intraand inter-speaker dependencies, achieving SOTA performance on many MERC benchmark datasets.…”

Section: Multimodal Emotion Recognitionmentioning

confidence: 99%

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition

Li,

Wang,

Funakoshi

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (JOYFUL), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that JOYFUL achieved state-of-the-art (SOTA) performance compared to all baselines.

show abstract

Section: Multimodal Emotion Recognitionmentioning

confidence: 99%

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition

Li,

Wang,

Funakoshi

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Another line of research (Ghazvininejad et al, 2019;Gu et al, 2019) tended to achieve a trade-off between performance and inference efficiency by using semi-NAT architecture. Besides, many researchers had made attempts to introduce NAT into different sequence generation tasks like speech recognition (Ren et al, 2019) and dialog state tracking (Le et al, 2020). Moreover, it is worth noting that many NAT related pre-trained generation models Qi et al, 2020) had been proposed to improve the generation results of pre-trained models.…”

Section: Non-autoregressive Transformermentioning

confidence: 99%

Non-Autoregressive Sentence Ordering

Bin,

Shi,

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Existing sentence ordering approaches generally employ encoder-decoder frameworks with the pointer net to recover the coherence by recurrently predicting each sentence step-by-step. Such an autoregressive manner only leverages unilateral dependencies during decoding and cannot fully explore the semantic dependency between sentences for ordering. To overcome these limitations, in this paper, we propose a novel Non-Autoregressive Ordering Network, dubbed NAON, which explores bilateral dependencies between sentences and predicts the sentence for each position in parallel. We claim that the non-autoregressive manner is not just applicable but also particularly suitable to the sentence ordering task because of two peculiar characteristics of the task: 1) each generation target is in deterministic length, and 2) the sentences and positions should match exclusively. Furthermore, to address the repetition issue of the naive non-autoregressive Transformer, we introduce an exclusive loss to constrain the exclusiveness between positions and sentences. To verify the effectiveness of the proposed model, we conduct extensive experiments on several common-used datasets and the experimental results show that our method outperforms all the autoregressive approaches and yields competitive performance compared with the state-of-the-arts. The codes are available at: https://github.com/steven640pixel/ nonautoregressive-sentence-ordering. Unordered SentencesS1. She poured herself a glass. S2.Kelsi opened up a bottle of grape juice. S3.The carpet was now covered in juice. S4.Accidentally she spilled some. S5. Kelsi just had to call the cleaners. Coherent Paragraph S2. Kelsi opened up a bottle of grape juice. S1. She poured herself a glass.S4. Accidentally she spilled some.

show abstract

“…Firstly, the natural language understanding (NLU) module (Abro et al, 2022) converts user requests into semantic slots, domain information, and user intention. Secondly, the dialogue state tracking (DST) module (Wu et al, 2019;Le et al, 2019;Lin et al, 2021;Heck et al, 2023) extracts the dialogue state, which records user requests in the form of slot-value pairs. The dialogue policy learning (POL) module (Chen et al, 2017;Geishauser et al, 2022) determines the next action of the dialogue agent based on the dialogue state.…”

Section: Related Workmentioning

confidence: 99%