Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills

Ori, Yoran,; Talmor, Alon; Berant, Jonathan

doi:10.48550/arxiv.2107.07261

Cited by 8 publications

(8 citation statements)

References 24 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, more effort needs to be put in using semi-structured data for model evaluation. Such approaches can be applied on other datasets such as WikiTableQA (Pasupat and Liang, 2015), TabFact (Chen et al, 2019), Hy-bridQA (Chen et al, 2020b;Zayats et al, 2021;Oguz et al, 2020), OpenTableQA (Chen et al, 2021), ToTTo (Parikh et al, 2020, Turing Tables (Yoran et al, 2021) i.e. table to text generation tasks, LogicTable and (Chen et al, 2020a) and recently proposed tabular reasoning models proposed in TAPAS (Müller et al, 2021;Herzig et al, 2020), TaBERT (Yin et al, 2020), TABBIE (Iida et al, 2021), TabGCN (Pramanick and Bhattacharya, 2021) and RCI (Glass et al, 2021).…”

Section: Discussion and Related Workmentioning

confidence: 99%

Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning

Gupta¹,

Bhat²,

Atreya³

et al. 2021

Preprint

View full text Add to dashboard Cite

While neural models routinely report state-ofthe-art performance across NLP tasks involving reasoning, their outputs are often observed to not properly use and reason on the evidence presented to them in the inputs. A model that reasons properly is expected to attend to the right parts of the input, be self-consistent in its predictions across examples, avoid spurious patterns in inputs, and to ignore biasing from its underlying pretrained language model in a nuanced, context-sensitive fashion (e.g. handling counterfactuals). Do today's models do so? In this paper, we study this question using the problem of reasoning on tabular data. The tabular nature of the input is particularly suited for the study as it admits systematic probes targeting the properties listed above. Our experiments demonstrate that a BERT-based model representative of today's state-of-the-art fails to properly reason on the following counts: it often (a) misses the relevant evidence, (b) suffers from hypothesis and knowledge biases, and, (c) relies on annotation artifacts and knowledge from pretrained language models as primary evidence rather than relying on reasoning on the premises in the tabular input.

show abstract

Section: Discussion and Related Workmentioning

confidence: 99%

Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning

Gupta¹,

Bhat²,

Atreya³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Given a table and an executable SQL query, TaPEx uses the query's execution result (obtained through an off-the-shelf SQL executor, e.g., MySQL) to supervise the TaLM as a neural executor. Yoran et al [103] generate at scale question-paragraph pairs that require different reasoning skills to enhance the numerical reasoning abilities in table QA. [69], described above, also has benefits for this task.…”

Section: Objectives By Downstream Tasksmentioning

confidence: 99%

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Dong¹,

Cheng²,

He³

et al. 2022

Preprint

View full text Add to dashboard Cite

Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-thearts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc. To fully use the supervision signals in unlabeled tables, a variety of pre-training objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships, and implicitly executing SQLs. And to best leverage the characteristics of (semi-)structured tables, various tabular language models, particularly with specially-designed attention mechanisms, have been explored. Since tables usually appear and interact with free-form text, table pre-training usually takes the form of table-text joint pre-training, which attracts significant research interests from multiple domains. This survey aims to provide a comprehensive review of different model designs, pre-training objectives, and downstream tasks for table pre-training, and we further share our thoughts and vision on existing challenges and future opportunities.

show abstract

“…Tabular Reasoning. Recent studies investigate various NLP tasks on semi-structured tabular data, including tabular NLI and fact verification Gupta et al, 2020;Zhang and Balog, 2019), tabular probing , various question answering and semantic parsing tasks (Pasupat and Liang, 2015;Krishnamurthy et al, 2017;Abbas et al, 2016;Sun et al, 2016;Chen et al, 2020b;Lin et al, 2020;Zayats et al, 2021;Oguz et al, 2020;Chen et al, 2021, inter alia), and table-to-text generation (e.g., Nan et al, 2021;Yoran et al, 2021;Chen et al, 2020a). Several strategies for representing Wikipedia relational tables were recently proposed, such as TAPAS (Herzig et al, 2020), TaBERT (Yin et al, 2020), TabStruc , TABBIE (Iida et al, 2021), TabGCN (Pramanick andBhattacharya, 2021) and RCI (Glass et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference

Minhas¹,

Shankhdhar²,

Gupta³

et al. 2022

Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER)

View full text Add to dashboard Cite

The ability to reason about tabular or semistructured knowledge is a fundamental problem for today's Natural Language Processing (NLP) systems.While significant progress has been achieved in the direction of tabular reasoning, these advances are limited to English due to the absence of multilingual benchmark datasets for semi-structured data. In this paper, we use machine translation methods to construct a multilingual tabular natural language inference (TNLI) dataset, namely XINFOTABS, which expands the English TNLI dataset of INFOTABS to ten diverse languages. We also present several baselines for multilingual tabular reasoning, e.g., machine translation-based methods and cross-lingual TNLI. We discover that the XINFOTABS evaluation suite is both practical and challenging. As a result, this dataset will contribute to increased linguistic inclusion in tabular reasoning research and applications.

show abstract

Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills

Cited by 8 publications

References 24 publications

Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning

Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference

Contact Info

Product

Resources

About