2021
DOI: 10.48550/arxiv.2104.07555
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation

Abstract: In this paper, we explore how QUESTEVAL, which is a Text-vs-Text metric, can be adapted for the evaluation of Data-to-Text Generation systems. QUESTEVAL is a referenceless metric that compares the predictions directly to the structured input data by automatically asking and answering questions. Its adaptation to Data-to-Text is not straightforward as it requires multi-modal Question Generation and Answering (QG & QA) systems. To this purpose, we propose to build synthetic multi-modal corpora that enables to tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 17 publications
0
4
0
Order By: Relevance
“…Hallucination detection. Existing research primarily contains statistical metrics [28,74,80], model-based metrics (including Information Extraction (IE)-based metric, QA-based metric [32,65,68], Natural Language Inference (NLI) Metrics [33,38,81], Faithfulness Classification Metrics [32,48,89], LM-based Metrics [26,75]), and human-based evaluations [69,73]. We list some typical work as follows: Dhingra et al [22] propose PARENT to measure hallucinations using both the source and target text as references.…”
Section: Related Workmentioning
confidence: 99%
“…Hallucination detection. Existing research primarily contains statistical metrics [28,74,80], model-based metrics (including Information Extraction (IE)-based metric, QA-based metric [32,65,68], Natural Language Inference (NLI) Metrics [33,38,81], Faithfulness Classification Metrics [32,48,89], LM-based Metrics [26,75]), and human-based evaluations [69,73]. We list some typical work as follows: Dhingra et al [22] propose PARENT to measure hallucinations using both the source and target text as references.…”
Section: Related Workmentioning
confidence: 99%
“…With the success of neural techniques in text generation tasks, applying neural sequence-to-sequence generation models became more common (Du et al, 2017;Sun et al, 2018). More recent works leverage pre-trained transformer based networks, such as T5 (Raffel et al, 2020), BART (Lewis et al, 2019), PEGASUS and Prophet-Net (Yan et al, 2020b), for question generation which have been successful in many applications (Dong et al, 2019b;Lelkes et al, 2021;Rebuffel et al, 2021;Pan et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Question generation (QG) aims to automatically create questions from a given text passage or document with or without answers. It has a wide range of applications such as improving question answering (QA) systems (Duan et al, 2017) and search engines (Han et al, 2019) through data augmentation, making chatbots more engaging Laban et al, 2020), enabling automatic evaluation (Rebuffel et al, 2021) and fact verification (Pan et al, 2021), and facilitating educational applications (Chen et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…BARTScore (Yuan et al, 2021): Because ROUGE scores only measure token overlap, other automated metrics (Rebuffel et al, 2021;Kryscinski et al, 2020;Wang et al, 2020;, et al, 2005) and SAMSum (Gliwa et al, 2019) datasets. We adopt some results reported from the literature (Feng et al, 2021a) and implement the pre-trained models for a fair comparison.…”
Section: Evaluation Metricsmentioning
confidence: 99%