Proceedings of the 12th International Conference on Natural Language Generation 2019
DOI: 10.18653/v1/w19-8652
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Noise Matters for Neural Natural Language Generation

Abstract: Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-theart NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
61
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 80 publications
(105 citation statements)
references
References 28 publications
1
61
0
Order By: Relevance
“…The former path is risky as it easily results in ungrammatical targets. The latter approach of enforcing a stronger alignment between inputs and outputs has been tried previously but it assumes a moderate amount of noise in the data (Nie et al, 2019;Dušek et al, 2019). Alternatively, one can leave the data as is and try to put more pressure on the decoder to pay attention to the input at every generation step (Tian et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…The former path is risky as it easily results in ungrammatical targets. The latter approach of enforcing a stronger alignment between inputs and outputs has been tried previously but it assumes a moderate amount of noise in the data (Nie et al, 2019;Dušek et al, 2019). Alternatively, one can leave the data as is and try to put more pressure on the decoder to pay attention to the input at every generation step (Tian et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…But this is not a straightforward task. Specially designed regular expressions (Dušek et al, 2019a) or heuristics involving dependency relations (Oraby et al, 2019) must be used. Augmented Input Sequence Once the surface forms of each attribute-value pair in a target utterance are found, we add them to the input sequence, as shown in Figure 2.…”
Section: Methodsmentioning
confidence: 99%
“…To extract the surface form of each attribute-value pair from a target utterance, we used modified regular expressions from Dušek et al (2019a). The input sequence was constructed in the format of a single token representing an attribute-value pair followed by multiple tokens for the surface form, e.g.…”
Section: Applying the Surface Forms Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In particular, this can be very likely if "the girl wants" appears much more frequent than "the boy wants" in the training corpus. This is a very important issue, because of its wide existence across many neural graph-to-text arXiv:2102.06749v1 [cs.CL] 12 Feb 2021 generation models, hindering the usability of these models for real-world applications (Dušek et al, 2018(Dušek et al, , 2019Balakrishnan et al, 2019).…”
Section: Introductionmentioning
confidence: 99%