Abstract:Data-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions.
“…Multiple terminologies, such as faithfulness [20,22,50,117,133,144,144,163,172,195,219], factual consistency [18,19,24,154,157,194], fidelity [23], factualness 4 [146], factuality 4 [33],…”
Section: Human Evaluationmentioning
confidence: 99%
“…This corpus filtering method consists of several steps: (1) quality measure training samples regarding hallucination which could utilize the metrics described above; (2) rank these hallucination scores in descending order; (3) select and filter out the untrustworthy samples at the bottom. Instance-level scores can lead to a signal loss because divergences occur at the word level, i.e., parts of the target sentence loyal to the source input, while others diverge [146].…”
Section: Clean Data Automaticallymentioning
confidence: 99%
“…The decoder converts vector representations into natural language [177], and this stage could contribute to hallucinations due to the limitation of existing decoding strategies. There are also some work modifying the decoder structure to mitigate hallucination, such as multi-branch decoder [146], uncertainty-aware decoder [195], dual decoder consisting of a sequential decoder and a tree-based decoder [160], and constrained decoder with lexical or structural limitations [7]. These decoders improve the possibility of faithful tokens while reducing the possibility of hallucinatory tokens during inference by figuring out the implicit discrepancy and dependency between tokens or limited by explicit constraints.…”
Section: Information Augmentationmentioning
confidence: 99%
“…Liu et al [107] select training instances based on faithfulness ranking. Finer-grained than the above instance-level method, Rebuffel et al [146] label tokens according to co-occurrence analysis and sentence structure through dependency parsing in the pre-processing step to explicate the correspondence between the input table and the text.…”
Section: Hallucination Mitigation In Data-to-text Generationmentioning
confidence: 99%
“…In order to mitigate hallucinations at the inference step, Rebuffel et al [146] propose a Multi-Branch Decoder that leverages word-level alignment labels between input table and paired text to learn the relevant parts of the training instance. And these word-level labels are gained through dependency parsing during the pre-processing step.…”
Section: Hallucination Mitigation In Data-to-text Generationmentioning
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent natural language generation, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended texts, which degrades the system performance and fail to meet user expectations in many real-world scenarios. In order to address this issue, there have been studies in measuring and mitigating hallucinated texts. However there has not been a comprehensive review of the state-of-the-art in hallucination detection and mitigation.In this survey, we provide a broad overview of the research progress and challenges in the hallucination problem of NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress for hallucinations in a large set of downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
“…Multiple terminologies, such as faithfulness [20,22,50,117,133,144,144,163,172,195,219], factual consistency [18,19,24,154,157,194], fidelity [23], factualness 4 [146], factuality 4 [33],…”
Section: Human Evaluationmentioning
confidence: 99%
“…This corpus filtering method consists of several steps: (1) quality measure training samples regarding hallucination which could utilize the metrics described above; (2) rank these hallucination scores in descending order; (3) select and filter out the untrustworthy samples at the bottom. Instance-level scores can lead to a signal loss because divergences occur at the word level, i.e., parts of the target sentence loyal to the source input, while others diverge [146].…”
Section: Clean Data Automaticallymentioning
confidence: 99%
“…The decoder converts vector representations into natural language [177], and this stage could contribute to hallucinations due to the limitation of existing decoding strategies. There are also some work modifying the decoder structure to mitigate hallucination, such as multi-branch decoder [146], uncertainty-aware decoder [195], dual decoder consisting of a sequential decoder and a tree-based decoder [160], and constrained decoder with lexical or structural limitations [7]. These decoders improve the possibility of faithful tokens while reducing the possibility of hallucinatory tokens during inference by figuring out the implicit discrepancy and dependency between tokens or limited by explicit constraints.…”
Section: Information Augmentationmentioning
confidence: 99%
“…Liu et al [107] select training instances based on faithfulness ranking. Finer-grained than the above instance-level method, Rebuffel et al [146] label tokens according to co-occurrence analysis and sentence structure through dependency parsing in the pre-processing step to explicate the correspondence between the input table and the text.…”
Section: Hallucination Mitigation In Data-to-text Generationmentioning
confidence: 99%
“…In order to mitigate hallucinations at the inference step, Rebuffel et al [146] propose a Multi-Branch Decoder that leverages word-level alignment labels between input table and paired text to learn the relevant parts of the training instance. And these word-level labels are gained through dependency parsing during the pre-processing step.…”
Section: Hallucination Mitigation In Data-to-text Generationmentioning
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent natural language generation, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended texts, which degrades the system performance and fail to meet user expectations in many real-world scenarios. In order to address this issue, there have been studies in measuring and mitigating hallucinated texts. However there has not been a comprehensive review of the state-of-the-art in hallucination detection and mitigation.In this survey, we provide a broad overview of the research progress and challenges in the hallucination problem of NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress for hallucinations in a large set of downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
We report error analysis of outputs from four Table-to-Text generation models fine-tuned on ToTTo, an open-domain English language dataset. We carried out a manual error annotation of a subset of outputs (a total of 3,016 sentences) belonging to the topic of Politics generated by these four models. Our error annotation focused on eight categories of errors. The error analysis shows that more than 46% of sentences from each of the four models have been errorfree. It uncovered some of the specific classes of errors; for example, WORD errors (mostly verbs and prepositions) are the dominant errors in all four models and are the most complex ones among other errors. NAME (mostly nouns) and NUMBER errors are slightly higher in two of the GeM benchmark models, whereas DATE_DIMENSION and OTHER categories of errors are more common in our Table-to-Text model. This in-depth error analysis is currently guiding us in improving our Tableto-Text model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.