Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?

Reed, Lena; Oraby, Shereen; Walker, Marilyn A.

doi:10.18653/v1/w18-6535

Cited by 30 publications

(34 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Second, semantic errors were computed following Reed et al (2018), where we implemented a script to estimate the coverage automatically based on regular expression matching. 28 This allowed us to produce an independent estimate of the proportion of outputs with missing or added information (see Table 12).…”

Section: Error Analysis: Input Mr Coveragementioning

confidence: 99%

“…28 This allowed us to produce an independent estimate of the proportion of outputs with missing or added information (see Table 12). Following Reed et al (2018), we also computed the slot error rate (SER) using this pattern-matching approach and the following formula: 29 SER = # missed + # added + # value errors + # repetitions # slots (5) Here, missed stands for slot values missing from the realisations, added denotes additional information not present in the MR (hallucinations), value errors denote correctly realised slots with incorrect values (e.g., specifying low price range instead of high),…”

Section: Error Analysis: Input Mr Coveragementioning

confidence: 99%

“…While the absolute numbers for perfectly covered MRs are different from those estimated by humans, they mostly follow the same trend. The SER value is highly correlated with the proportion of perfectly covered MRs. 28 We based the patterns for the individual attribute-value pairs on Reed et al (2018)'s script and manually enhanced them using the first 500 instances of the E2E development set. 29 Note that the coverage and SER values produced by the script is only an estimate as the patterns for a given attribute-value pair will not cover all possible all correct ways to express it.…”

Section: Error Analysis: Input Mr Coveragementioning

confidence: 99%

See 2 more Smart Citations

Evaluating the state-of-the-art of End-to-End Natural Language Generation: The E2E NLG challenge

Dušek

Novikova

Rieser

2020

Computer Speech & Language

170

183

View full text Add to dashboard Cite

This paper provides a comprehensive analysis of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based on the results. This shared task aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntactic complexity and diverse discourse phenomena. Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -with the majority implementing sequence-to-sequence models (seq2seq) -as well as systems based on grammatical rules and templates. Seq2seq-based systems have demonstrated a great potential for NLG in the challenge. We find that seq2seq systems generally score high in terms of word-overlap metrics and human evaluations of naturalness -with the winning Slug system (Juraska et al., 2018) being seq2seq-based.However, vanilla seq2seq models often fail to correctly express a given meaning representation if they lack a strong semantic control mechanism applied during decoding.Moreover, seq2seq models can be outperformed by hand-engineered systems in terms of overall quality, as well as complexity, length and diversity of outputs. This research has influenced, inspired and motivated a number of recent studies outwith the original competition, which we also summarise as part of this paper.

show abstract

Section: Error Analysis: Input Mr Coveragementioning

confidence: 99%

Section: Error Analysis: Input Mr Coveragementioning

confidence: 99%

Section: Error Analysis: Input Mr Coveragementioning

confidence: 99%

See 1 more Smart Citation

Evaluating the state-of-the-art of End-to-End Natural Language Generation: The E2E NLG challenge

Dušek

Novikova

Rieser

2020

Computer Speech & Language

170

183

View full text Add to dashboard Cite

show abstract

“…This poses a dual challenge: First, since the MR does not specify these discourse relations, crowdworkers creating the dataset in turn have no instructions on when to use them, and must thus use their own judgment in creating a natural-sounding response. While the E2E organizers tout the resulting response variations as a plus, Reed et al (2018) find that current neural systems are unable to learn to express discourse relations effectively with this dataset, and explore ways of enriching input MRs to do so. Indeed, now that the E2E system outputs have been released, a search through outputs from all participating systems reveals only 43 outputs (0.4% out of 10080) containing contrastive tokens, on a test set containing about 300 contrastive samples.…”

Section: Limitations Of Flat Mrsmentioning

confidence: 99%

Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

Balakrishnan¹,

Rao²,

Upasani³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Generating fluent natural language responses from structured semantic representations is a critical step in task-oriented conversational systems. Avenues like the E2E NLG Challenge have encouraged the development of neural approaches, particularly sequence-tosequence (Seq2Seq) models for this problem. The semantic representations used, however, are often underspecified, which places a higher burden on the generation model for sentence planning, and also limits the extent to which generated responses can be controlled in a live system. In this paper, we (1) propose using tree-structured semantic representations, like those used in traditional rule-based NLG systems, for better discourse-level structuring and sentence-level planning; (2) introduce a challenging dataset using this representation for the weather domain; (3) introduce a constrained decoding approach for Seq2Seq models that leverages this representation to improve semantic correctness; and (4) demonstrate promising results on our dataset and the E2E dataset. * Alphabetical by first name † Work done while on leave from Ohio State University 1 Also see https://ehudreiter.com/2018/11/ 12/hallucination-in-neural-nlg/. Reference 1 JJ's Pub is not family friendly, but has a high customer rating of 5 out of 5. It is a restaurant near the Crowne Plaza Hotel. Reference 2 JJ's Pub is not a family friendly restaurant. It has a high customer rating of 5 out of 5. You can find it near the Crowne Plaza Hotel. E2E MR name[JJ's Pub] rating[5 out of 5] familyFriendly[no] eatType[restaurant] near[Crowne Plaza Hotel] Our MR for Reference 1 CONTRAST [ INFORM [ name[JJ's Pub] familyFriendly[no] ] INFORM [ rating[5 out of 5] ] ] INFORM [ eatType[restaurant] near[Crowne Plaza Hotel] ]

show abstract

“…To produce a cleaned version of the E2E data, we used the original human textual references, but paired them with correctly matching MRs. 4 To this end, we reimplemented the slot matching script of Reed et al (2018), which tags MR slots and values using regular expressions. We tuned our expressions based on the first 500 instances from the E2E development set and ran the script on the full dataset, producing corrected MRs for all human references (see Figure 1).…”

Section: Cleaning the Meaning Representationsmentioning

confidence: 99%

Semantic Noise Matters for Neural Natural Language Generation

Dušek¹,

Howcroft²,

Rieser³

2019

Proceedings of the 12th International Conference on Natural Language Generation

View full text Add to dashboard Cite

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-theart NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.

show abstract

Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?

Cited by 30 publications

References 50 publications

Evaluating the state-of-the-art of End-to-End Natural Language Generation: The E2E NLG challenge

Evaluating the state-of-the-art of End-to-End Natural Language Generation: The E2E NLG challenge

Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

Semantic Noise Matters for Neural Natural Language Generation

Contact Info

Product

Resources

About