Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 2022
DOI: 10.18653/v1/2022.emnlp-main.280
|View full text |Cite
|
Sign up to set email alerts
|

Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

Abstract: Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 17 publications
(27 reference statements)
0
0
0
Order By: Relevance
“…HEAR shows state-of-the-art performances on all the metrics compared to previous works (i.e., please refer to Related Work for their detailed captions.). Our baseline DLM is T5 Transformer (Raffel et al, 2020), which is the same baseline (i.e., T5RLM) of THAM (Yoon et al, 2022c), but here, our proposed SAL shows more gains, and further improvements are also shown by applying RLE. As our proposed HEAR is performed in a model-agnostic manner, we also validate other VGD models with HEAR in Table 2.…”
Section: Results On Avsd Benchmarkmentioning
confidence: 99%
See 3 more Smart Citations
“…HEAR shows state-of-the-art performances on all the metrics compared to previous works (i.e., please refer to Related Work for their detailed captions.). Our baseline DLM is T5 Transformer (Raffel et al, 2020), which is the same baseline (i.e., T5RLM) of THAM (Yoon et al, 2022c), but here, our proposed SAL shows more gains, and further improvements are also shown by applying RLE. As our proposed HEAR is performed in a model-agnostic manner, we also validate other VGD models with HEAR in Table 2.…”
Section: Results On Avsd Benchmarkmentioning
confidence: 99%
“…For the sensible decision in SAL, we introduce two technical contributions: (1) Keyword-based Audio Sensing and (2) Semantic Neural Estimator. HEAR is applied on current runner models (Hori et al, 2019a;Yoon et al, 2022c;Li et al, 2021b) in a model-agnostic manner, where the effectiveness is validated on VGD dataset (i.e., AVSD@DSTC7, AVSD@DSTC8) with steady performance gains on natural language generation metrics.…”
Section: Our Experimental Evidence Inmentioning
confidence: 99%
See 2 more Smart Citations