End-to-End Content and Plan Selection for Data-to-Text Generation

Gehrmann, Sebastian; Dai, Falcon Z.; Elder, Henry; Rush, Alexander M.

doi:10.48550/arxiv.1810.04700

Cited by 2 publications

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pointer Generator (See et al, 2017) A LSTMbased seq2seq model with copy mechanism. While originally designed for text summarization, it is also used in data-to-text (Gehrmann et al, 2018). BERT-to-BERT (Rothe et al, 2020) A transformer encoder-decoder model (Vaswani et al, 2017) initialized with BERT (Devlin et al, 2018).…”

Section: Methodsmentioning

confidence: 99%

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

Cheng¹,

Dong²,

Wang³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Tables are often created with hierarchies, but existing works on table reasoning mainly focus on flat tables and neglect hierarchical tables. Hierarchical tables challenge table reasoning by complex hierarchical indexing, as well as implicit relationships of calculation and semantics. We present a new dataset, HiTab, to study question answering (QA) and natural language generation (NLG) over hierarchical tables. HiTab is a cross-domain dataset constructed from a wealth of statistical reports and Wikipedia pages, and has unique characteristics: (1) nearly all tables are hierarchical, and (2) questions are not proposed by annotators from scratch, but are revised from real and meaningful sentences authored by analysts.(3) To reveal complex numerical reasoning in analysis, we provide fine-grained annotations of quantity and entity alignment. Experimental results show that HiTab presents a strong challenge for existing baselines and a valuable benchmark for future research. Targeting hierarchical structure, we devise an effective hierarchy-aware logical form for symbolic reasoning over tables. Furthermore, we leverage entity and quantity alignment to explore partially supervised training in QA and conditional generation in NLG, which largely reduces spurious predictions in QA and meaningless descriptions in NLG. The dataset and code are available at https://github.com/ microsoft/HiTab.

show abstract

Section: Methodsmentioning

confidence: 99%

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

Cheng¹,

Dong²,

Wang³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…A prime example is the delexicalization technique used by most current generators (e.g., Oh and Rudnicky, 2000;Mairesse et al, 2010;Wen et al, 2015a,b;Juraska et al, 2018): It is generally assumed that attribute (slot) values from the input meaning representation (MR) can be replaced by placeholders during generation and inserted into the output verbatim. Delexicalization or an analogous technique, such as a copy mechanism (Gu et al, 2016;Gehrmann et al, 2018), is required for most generation scenarios to allow generalization to unseen entity names: sets of entities are open (potentially infinite and subject to change) while training data is scarce. However, the verbatim insertion assumption does not hold for languages with extensive noun inflection -attribute values need to be inflected here to produce fluent outputs (see Figure 1).…”

Section: Introductionmentioning

confidence: 99%

Neural Generation for Czech: Data and Baselines

Dušek

Jurčíček

2019

Proceedings of the 12th International Conference on Natural Language Generation

View full text Add to dashboard Cite

We present the first dataset targeted at end-toend NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach. While non-English NLG is under-explored in general, Czech, as a morphologically rich language, makes the task even harder: Since Czech requires inflecting named entities, delexicalization or copy mechanisms do not work out-ofthe-box and lexicalizing the generated outputs is non-trivial.In our experiments, we present two different approaches to this this problem: (1) using a neural language model to select the correct inflected form while lexicalizing, (2) a two-step generation setup: our sequence-to-sequence model generates an interleaved sequence of lemmas and morphological tags, which are then inflected by a morphological generator.• Using both automatic and manual evaluation in Section 4, we show that our extensions improve

show abstract

End-to-End Content and Plan Selection for Data-to-Text Generation

Cited by 2 publications

References 0 publications

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

Neural Generation for Czech: Data and Baselines

Contact Info

Product

Resources

About