Creating Training Corpora for NLG Micro-Planners

Gardent, Claire; Shimorina, Anastasia; Narayan, Shashi; Perez-Beltrachini, Laura

doi:10.18653/v1/p17-1017

Cited by 331 publications

(287 citation statements)

References 11 publications

(5 reference statements)

Supporting

Mentioning

242

Contrasting

Unclassified

Order By: Relevance

“…To check how PARENT correlates with human judgments when the references are elicited from humans (and less likely to be divergent), we check its correlation with the human ratings provided for the systems competing in the WebNLG challenge (Gardent et al, 2017). The task is to generate text describing 1-5 RDF triples (e.g.…”

Section: Webnlg Datasetmentioning

confidence: 99%

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

Dhingra¹,

Faruqui²,

Parikh³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

105

123

View full text Add to dashboard Cite

Automatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio (Lebret et al., 2016), often contain reference texts that diverge from the information in the corresponding semistructured data. We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments when those references diverge. We propose a new metric, PAR-ENT, which aligns n-grams from the reference and generated texts to the semi-structured data before computing their precision and recall. Through a large scale human evaluation study of table-to-text models for WikiBio, we show that PARENT correlates with human judgments better than existing text generation metrics. We also adapt and evaluate the information extraction based evaluation proposed in Wiseman et al. (2017), and show that PAR-ENT has comparable correlation to it, while being easier to use. We show that PARENT is also applicable when the reference texts are elicited from humans using the data from the WebNLG challenge. 1 * Work done during an internship at Google. 1 Code and Data:

show abstract

Section: Webnlg Datasetmentioning

confidence: 99%

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

Dhingra¹,

Faruqui²,

Parikh³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

105

123

View full text Add to dashboard Cite

show abstract

“…The E2E dataset (Novikova et al, 2017b) consists of 47K restaurant descriptions based on 5.7K distinct inputs of 3-8 attributes (name, area, near, eat type, food, price range, family friendly, rating), split into 4862 inputs for training, 547 for development and 630 for testing. The WebNLG dataset (Gardent et al, 2017a) To preprocess both datasets, we lowercase all inputs and references and represent the inputs in the bracketed format as shown in Figure 1. For the word-based processing we additionally tokenize the texts with the nltk-tokenizer (Bird et al, 2009) and apply delexicalization, as also illustrated in Figure 1.…”

Section: Modelsmentioning

confidence: 99%

Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity

Jagfeld¹,

Jenne²,

Vu³

2018

Proceedings of the 11th International Conference on Natural Language Generation

View full text Add to dashboard Cite

We present a comparison of word-based and character-based sequence-to-sequence models for data-to-text natural language generation, which generate natural language descriptions for structured inputs. On the datasets of two recent generation challenges, our models achieve comparable or better automatic evaluation results than the best challenge submissions. Subsequent detailed statistical and human analyses shed light on the differences between the two input representations and the diversity of the generated texts. In a controlled experiment with synthetic training data generated from templates, we demonstrate the ability of neural models to learn novel combinations of the templates and thereby generalize beyond the linguistic structures they were trained on.

show abstract

“…• WEBNLGMODEL : The WEBNLGMODEL is designed to be trained and tested on the WEBNLG dataset. An already trained WEBNLGMODEL model (similar to the one by Gardent et al (2017)) is evaluated on WIKITABLEPARA and WIKIBIO datasets. For WIKITABLEPARA dataset, we convert every table to M × (N − 1) triples.…”

Section: Methodsmentioning

confidence: 99%

Scalable Micro-planned Generation of Discourse from Structured Data

Laha

Jain

Mishra³

et al. 2020

Computational Linguistics

View full text Add to dashboard Cite

We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically employ end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore, exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. It rather relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains.Our system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular datasets covering diverse data types such as Knowledge Graphs and Key-Value maps. * The first three authors have equally contributed to the work Submission

show abstract

Creating Training Corpora for NLG Micro-Planners

Cited by 331 publications

References 11 publications

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity

Scalable Micro-planned Generation of Discourse from Structured Data

Contact Info

Product

Resources

About