UniMF: A Unified Framework to Incorporate Multimodal Knowledge Bases intoEnd-to-End Task-Oriented Dialogue Systems

Yang, Shiquan; Zhang, Rui; Erfani, Sarah M.; Lau, Jey Han

doi:10.24963/ijcai.2021/548

Cited by 16 publications

(3 citation statements)

References 2 publications

(2 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There has been some effort to incorporate knowledge broader than what can be learned from the training dataset itself. One particular area of application is visual dialogue, where external knowledge bases have been proposed [160],…”

Section: Directions For Future Researchmentioning

confidence: 99%

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

Żelaszczyk

Mańdziuk

2023

Information Fusion

View full text Add to dashboard Cite

Section: Directions For Future Researchmentioning

confidence: 99%

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

Żelaszczyk

Mańdziuk

2023

Information Fusion

View full text Add to dashboard Cite

“…W. Wei et al [28] think dialogue reading comprehension is also used for intelligent human-computer interaction systems. Compared to the more mature two-party dialogue MRC [29], [30], one would expect applications such as dialogue systems to be able to handle more complex multi-party dialogue MRC. Due to the excellent performance of PrLMs in text-level NLP tasks (Section II-A), PrLMs have been widely used in the processing of multi-party dialogue MRC in earlier studies [31], [32].…”

Section: B Transformers For Learning Dialoguementioning

confidence: 99%

Dialogue Logic Aware and Key Utterance Decoupling Model for Multi-Party Dialogue Reading Comprehension

Yang

Gao

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Multi-party dialogue machine reading comprehension (MRC) brings an unprecedented challenge due to the multiple speakers and the complex discourse linkages among speaker-aware utterances. The majority of current methods only consider the textual aspects of dialogue situations, and pay little attention to crucial speaker-aware cues. This prevents a model from capturing the speaker's intention and important discourse information for questions in a complex discourse relationship, leading to the model giving wrong answers. In this paper, we construct a dialogue logic graph module by the relational graph convolutional network (R-GCN) to structure the dialogue information, and design a speaker prediction task to enhance the ability to capture discourse logic. Additionally, we construct a key utterance information decoupling module that focuses on the key discourse information flow involve questions, and filters out noise information. Extensive experiments FriendsQA and Molweni show that our approach outperforms competitive baselines and current state-of-the-art models, especially when dealing with more rounds of dialogue and questions involving people, events and time.

show abstract

“…Since knowledge plays a vital role in the response generation of task-oriented dialog systems, we first conduct the knowledge acquisition for the given multimodal context. Considering the semantic knowledge is pivotal to capturing the user's intentions [33,36,37], we focus on selecting two kinds of semantic knowledge: attribute knowledge and relation knowledge. Thereinto, the attribute knowledge, which is widely used, refers to the attribute-value pairs of entities mentioned directly in the context.…”

Section: Dual Semantic Knowledge Acquisitionmentioning

confidence: 99%

Comparison of Features in Content Coverage and Presentation of Complex Numbers in Textbooks of China, Japan and Singapore

Chen

Jun

2021

School Mathematics Textbooks in China

View full text Add to dashboard Cite

Textual response generation is an essential task for multimodal task-oriented dialog systems. Although existing studies have achieved fruitful progress, they still suffer from two critical limitations: 1) focusing on the attribute knowledge but ignoring the relation knowledge that can reveal the correlations between different entities and hence promote the response generation, and 2) only conducting the cross-entropy loss based output-level supervision but lacking the representation-level regularization. To address these limitations, we devise a novel multimodal task-oriented dialog system (named MDS-S 2 ). Specifically, MDS-S 2 first simultaneously acquires the context related attribute and relation knowledge from the knowledge base, whereby the non-intuitive relation knowledge is extracted by the 𝑛-hop graph walk. Thereafter, considering that the attribute knowledge and relation knowledge can benefit the responding to different levels of questions, we design a multi-level knowledge composition module in MDS-S 2 to obtain the latent composed response representation. Moreover, we devise a set of latent query variables to distill the semantic information from the composed response representation and the ground truth response representation, respectively, and thus conduct the representation-level semantic regularization. Extensive experiments on a public dataset have verified the superiority of our proposed MDS-S 2 . We have released the codes and parameters to facilitate the research community.

show abstract

UniMF: A Unified Framework to Incorporate Multimodal Knowledge Bases intoEnd-to-End Task-Oriented Dialogue Systems

Cited by 16 publications

References 2 publications

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

Cross-modal text and visual generation: A systematic review. Part 1: Image to text

Dialogue Logic Aware and Key Utterance Decoupling Model for Multi-Party Dialogue Reading Comprehension

Comparison of Features in Content Coverage and Presentation of Complex Numbers in Textbooks of China, Japan and Singapore

Contact Info

Product

Resources

About