Mining Knowledge for Natural Language Inference from Wikipedia Categories

Chen, Mingda; Chu, Zewei; Stratos, Karl; Gimpel, Kevin

doi:10.18653/v1/2020.findings-emnlp.313

Cited by 6 publications

(6 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In future work, we want to combine efficiency with other highly desirable properties of evaluation metrics such as robustness (Vu et al, 2022;Chen and Eger, 2023;Rony et al, 2022) and explainability (Kaster et al, 2021;Sai et al, 2021;Fomicheva et al, 2021;Leiter et al, 2022) to induce metrics that jointly satisfy these criteria.…”

Section: Discussionmentioning

confidence: 99%

“…Evaluation metrics: Recent transformerbased metrics utilize BERT-based models like BERTScore (Zhang et al, 2020) and Mover-Score (Zhao et al, 2019). Extensions include BARTScore (Yuan et al, 2021), which reads off probability estimates as metric scores directly from text generation systems, and MENLI (Chen and Eger, 2023), which uses probabilities from models fine-tuned on Natural Language Inference task. These metrics are reference-based (comparing the MT output to a human reference), like BERTScore and MoverScore, or reference-free (comparing the MT output to the source text), like XMover-Score (Zhao et al, 2020) and SentSim (Song et al, 2021), and some are trained (fine-tuned on human scores) like COMET (Rei et al, 2020) while others are untrained, like BERTScore.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics

Larionov,

Grünwald,

Leiter

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Efficiency is a key property to foster inclusiveness and reduce environmental costs, especially in an era of LLMs. In this work, we provide a comprehensive evaluation of efficiency for MT evaluation metrics. Our approach involves replacing computation-intensive transformers with lighter alternatives and employing linear and quadratic approximations for alignment algorithms on top of LLM representations. We evaluate six (reference-free and referencebased) metrics across three MT datasets and examine 16 lightweight transformers. In addition, we look into the training efficiency of metrics like COMET by utilizing adapters. Our results indicate that (a) TinyBERT provides the optimal balance between quality and efficiency, (b) CPU speed-ups are more substantial than those on GPU; (c) WMD approximations yield no efficiency gains while reducing quality and (d) adapters enhance training efficiency (regarding backward pass speed and memory requirements) as well as, in some cases, metric quality. These findings can help to strike a balance between evaluation speed and quality, which is essential for effective NLG systems. Furthermore, our research contributes to the ongoing efforts to optimize NLG evaluation metrics with minimal impact on performance. To our knowledge, ours is the most comprehensive analysis of different aspects of efficiency for MT metrics conducted so far.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics

Larionov,

Grünwald,

Leiter

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…It considers contextual information and semantic similarity, providing a more nuanced and accurate evaluation of summary quality (Zhang et al, 2019). Chen and Eger (2023), introduces a novel approach by advocating the direct utilization of pretrained Natural Language Inference (NLI) models as evaluation metrics. Furthermore, they developed a novel preference-based adversarial test suite for machine translation and summarization metrics.…”

Section: Related Workmentioning

confidence: 99%

Reference-Free Summarization Evaluation with Large Language Models

Akkasi,

Fraser,

Komeili

2023

Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems

View full text Add to dashboard Cite

With the continuous advancement in unsupervised learning methodologies, text generation has become increasingly pervasive. However, the evaluation of the quality of the generated text remains challenging. Human annotations are expensive and often show high levels of disagreement, in particular for certain tasks characterized by inherent subjectivity, such as translation and summarization. Consequently, the demand for automated metrics that can reliably assess the quality of such generative systems and their outputs has grown more pronounced than ever. In 2023, Eval4NLP organized a shared task dedicated to the automatic evaluation of outputs from two specific categories of generative systems: machine translation and summarization. This evaluation was achieved through the utilization of prompts with Large Language Models. Participating in the summarization evaluation track, we propose an approach that involves prompting LLMs to evaluate six different latent dimensions of summarization quality. In contrast to many previous approaches to summarization assessments, which emphasize lexical overlap with reference text, this method surfaces the importance of correct syntax in summarization evaluation. Our method resulted in the second-highest performance in this shared task, demonstrating its effectiveness as a reference-free evaluation.

show abstract

“…finds that utilizing ConceptNet as an external knowledge source can benefit entailment model in scientific domain. Chen et al (2020b) proposes WIKINLI, a large-scale naturally annotated dataset constructed from Wikipedia category graph. And they show that model pretrained on this dataset can achieve better performance on downstream natural language entailment tasks.…”

Section: Modeling External Knowledge In Nlpmentioning

confidence: 99%

Modeling Entity Knowledge for Fact Verification

Liu¹,

Zhu²,

Zeng³

2021

Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)

View full text Add to dashboard Cite

Fact verification is a challenging task of identifying the truthfulness of given claims based on the retrieval of relevant evidence texts. Many claims require understanding and reasoning over external entity information for precise verification. In this paper, we propose a novel fact verification model using entity knowledge to enhance its performance. We retrieve descriptive text from Wikipedia for each entity, and then encode these descriptions by a smaller lightweight network to be fed into the main verification model. Furthermore, we boost model performance by adopting and predicting the relatedness between the claim and each evidence as additional signals. We demonstrate experimentally on a large-scale benchmark dataset FEVER that our framework achieves competitive results with a FEVER score of 72.89% on the test set.

show abstract

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Cited by 6 publications

References 41 publications

EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics

EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics

Reference-Free Summarization Evaluation with Large Language Models

Modeling Entity Knowledge for Fact Verification

Contact Info

Product

Resources

About