Abstract:Understanding tables is an important and relevant task that involves understanding table structure as well as being able to compare and contrast information within cells. In this paper, we address this challenge by presenting a new dataset and tasks that addresses this goal in a shared task in SemEval 2020 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM-TAB-FACTS). Our dataset contains 981 manuallygenerated tables and an auto-generated dataset of 1980 tables providi… Show more
“…More work is needed to make models interpretable, either through explanations or by pointing to the evidence that is used for predictions (e.g., Feng et al, 2018;Serrano and Smith, 2019;Jain and Wallace, 2019;Wiegreffe and Pinter, 2019;DeYoung et al, 2020;Paranjape et al, 2020;Hewitt and Liang, 2019;Niven and Kao, 2019;Ravichander et al, 2021). Many recent shared tasks on reasoning over semi-structured tabular data (such as SemEval 2021 Task 9 [Wang et al, 2021a] and FEVEROUS [Aly et al, 2021]) have highlighted the importance of, and the challenges associated with, evidence extraction for claim verification.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Dataset Recently, datasets such as TabFact (Chen et al, 2020b) and INFOTABS , and also shared tasks such as SemEval 2021 Task 9 (Wang et al, 2021a) and FEVER-OUS (Aly et al, 2021), have sparked interest in tabular NLI research. In this study, we use the INFOTABS dataset for our investigations.…”
Neural models command state-of-the-art performance across NLP tasks, including ones involving “reasoning”. Models claiming to reason about the evidence presented to them should attend to the correct parts of the input while avoiding spurious patterns therein, be self-consistent in their predictions across inputs, and be immune to biases derived from their pre-training in a nuanced, context- sensitive fashion. Do the prevalent *BERT- family of models do so? In this paper, we study this question using the problem of reasoning on tabular data. Tabular inputs are especially well-suited for the study—they admit systematic probes targeting the properties listed above. Our experiments demonstrate that a RoBERTa-based model, representative of the current state-of-the-art, fails at reasoning on the following counts: it (a) ignores relevant parts of the evidence, (b) is over- sensitive to annotation artifacts, and (c) relies on the knowledge encoded in the pre-trained language model rather than the evidence presented in its tabular inputs. Finally, through inoculation experiments, we show that fine- tuning the model on perturbed data does not help it overcome the above challenges.
“…More work is needed to make models interpretable, either through explanations or by pointing to the evidence that is used for predictions (e.g., Feng et al, 2018;Serrano and Smith, 2019;Jain and Wallace, 2019;Wiegreffe and Pinter, 2019;DeYoung et al, 2020;Paranjape et al, 2020;Hewitt and Liang, 2019;Niven and Kao, 2019;Ravichander et al, 2021). Many recent shared tasks on reasoning over semi-structured tabular data (such as SemEval 2021 Task 9 [Wang et al, 2021a] and FEVEROUS [Aly et al, 2021]) have highlighted the importance of, and the challenges associated with, evidence extraction for claim verification.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Dataset Recently, datasets such as TabFact (Chen et al, 2020b) and INFOTABS , and also shared tasks such as SemEval 2021 Task 9 (Wang et al, 2021a) and FEVER-OUS (Aly et al, 2021), have sparked interest in tabular NLI research. In this study, we use the INFOTABS dataset for our investigations.…”
Neural models command state-of-the-art performance across NLP tasks, including ones involving “reasoning”. Models claiming to reason about the evidence presented to them should attend to the correct parts of the input while avoiding spurious patterns therein, be self-consistent in their predictions across inputs, and be immune to biases derived from their pre-training in a nuanced, context- sensitive fashion. Do the prevalent *BERT- family of models do so? In this paper, we study this question using the problem of reasoning on tabular data. Tabular inputs are especially well-suited for the study—they admit systematic probes targeting the properties listed above. Our experiments demonstrate that a RoBERTa-based model, representative of the current state-of-the-art, fails at reasoning on the following counts: it (a) ignores relevant parts of the evidence, (b) is over- sensitive to annotation artifacts, and (c) relies on the knowledge encoded in the pre-trained language model rather than the evidence presented in its tabular inputs. Finally, through inoculation experiments, we show that fine- tuning the model on perturbed data does not help it overcome the above challenges.
“…Even though tables are also widely used to convey information, especially in scientific texts, there has been comparatively less work on verifying if a given table supports a statement. To this end, Se-mEval 2021 Task 9 (Wang et al, 2021) focuses on statement verification and evidence finding for tables from scientific articles in the English language. The task is divided into two subtasks -A and B.…”
Tables are widely used in various kinds of documents to present information concisely. Understanding tables is a challenging problem that requires an understanding of language and table structure, along with numerical and logical reasoning. In this paper, we present our systems to solve Task 9 of SemEval-2021: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACTS). The task consists of two subtasks: (A) Given a table and a statement, predicting whether the table supports the statement and (B) Predicting which cells in the table provide evidence for/against the statement. We fine-tune TAPAS (a model which extends BERT's architecture to capture tabular structure) for both the subtasks as it has shown state-of-the-art performance in various table understanding tasks. In subtask A, we evaluate how transfer learning and standardizing tables to have a single header row improves TAPAS' performance. In subtask B, we evaluate how different fine-tuning strategies can improve TAPAS' performance. Our systems achieve an F1 score of 67.34 in subtask A three-way classification, 72.89 in subtask A two-way classification, and 62.95 in subtask B.
“…This year, SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACT), aims to verify statements and find evidence from tables in scientific articles (Wang et al, 2021). It is an important task targeting at promoting proper interpretation of the surrounding article.…”
Section: Introductionmentioning
confidence: 99%
“…The task of verification from structured evidence, such as tables, charts, and databases, is still less explored. This paper describes sattiy team's system in SemEval-2021 task 9: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACT) (Wang et al, 2021). This competition aims to verify statements and to find evidence from tables for scientific articles and to promote the proper interpretation of the surrounding article.…”
Question answering from semi-structured tables can be seen as a semantic parsing task and is significant and practical for pushing the boundary of natural language understanding. Existing research mainly focuses on understanding contents from unstructured evidence, e.g., news, natural language sentences, and documents. The task of verification from structured evidence, such as tables, charts, and databases, is still less explored. This paper describes sattiy team's system in SemEval-2021 task 9: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACT) (Wang et al., 2021). This competition aims to verify statements and to find evidence from tables for scientific articles and to promote the proper interpretation of the surrounding article. In this paper, we exploited ensemble models of pre-trained language models over tables, TaPas and TaBERT, for Task A and adjust the result based on some rules extracted for Task B. Finally, in the leaderboard, we attain the F1 scores of 0.8496 and 0.7732 in Task A for the 2-way and 3-way evaluation, respectively, and the F1 score of 0.4856 in Task B.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.