Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1017
|View full text |Cite
|
Sign up to set email alerts
|

Creating Training Corpora for NLG Micro-Planners

Abstract: In this paper, we present a novel framework for semi-automatically creating linguistically challenging microplanning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB ve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
242
0
1

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 331 publications
(287 citation statements)
references
References 11 publications
(5 reference statements)
0
242
0
1
Order By: Relevance
“…To check how PARENT correlates with human judgments when the references are elicited from humans (and less likely to be divergent), we check its correlation with the human ratings provided for the systems competing in the WebNLG challenge (Gardent et al, 2017). The task is to generate text describing 1-5 RDF triples (e.g.…”
Section: Webnlg Datasetmentioning
confidence: 99%
“…To check how PARENT correlates with human judgments when the references are elicited from humans (and less likely to be divergent), we check its correlation with the human ratings provided for the systems competing in the WebNLG challenge (Gardent et al, 2017). The task is to generate text describing 1-5 RDF triples (e.g.…”
Section: Webnlg Datasetmentioning
confidence: 99%
“…The E2E dataset (Novikova et al, 2017b) consists of 47K restaurant descriptions based on 5.7K distinct inputs of 3-8 attributes (name, area, near, eat type, food, price range, family friendly, rating), split into 4862 inputs for training, 547 for development and 630 for testing. The WebNLG dataset (Gardent et al, 2017a) To preprocess both datasets, we lowercase all inputs and references and represent the inputs in the bracketed format as shown in Figure 1. For the word-based processing we additionally tokenize the texts with the nltk-tokenizer (Bird et al, 2009) and apply delexicalization, as also illustrated in Figure 1.…”
Section: Modelsmentioning
confidence: 99%
“…• WEBNLGMODEL : The WEBNLGMODEL is designed to be trained and tested on the WEBNLG dataset. An already trained WEBNLGMODEL model (similar to the one by Gardent et al (2017)) is evaluated on WIKITABLEPARA and WIKIBIO datasets. For WIKITABLEPARA dataset, we convert every table to M × (N − 1) triples.…”
Section: Methodsmentioning
confidence: 99%