Statistical Dataset Evaluation: Reliability, Difficulty, and Validity

Wang, Chengwen; Qingxiu, Dong,; Wang, Xiaochen; Wang, Haitao; Sui, Zhifang

doi:10.48550/arxiv.2212.09272

Cited by 2 publications

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To understand the performance of the methods on the datasets, we calculated the difficulty of the datasets and the similarity between the train and test sets of datasets. As difficulty metrics, we use 2 metrics: Entity Ambugity Degree (EAD), and Text Complexity (TC) (Wang et al, 2022a). We also use Target Vocabulary Covered (TVC) as similarity metric (Dai et al, 2019).…”

Section: A Dataset Detailsmentioning

confidence: 99%

impact of sample selection on in-context learning for entity extraction from scientific writing

Bölücü,

Rybinski,

Wan

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Prompt-based usage of Large Language Models (LLMs) is an increasingly popular way to tackle many well-known natural language problems. This trend is due, in part, to the appeal of the In-Context Learning (ICL) prompt set-up, in which a few selected training examples are provided along with the inference request. ICL, a type of few-shot learning, is especially attractive for natural language processing (NLP) tasks defined for specialised domains, such as entity extraction from scientific documents, where the annotation is very costly due to expertise requirements for the annotators. In this paper, we present a comprehensive analysis of in-context sample selection methods for entity extraction from scientific documents using GPT-3.5 and compare these results against a fully supervised transformer-based baseline.Our results indicate that the effectiveness of the in-context sample selection methods is heavily domain-dependent, but the improvements are more notable for problems with a larger number of entity types. More in-depth analysis shows that ICL is more effective for low-resource setups of scientific information extraction. 1

show abstract

Section: A Dataset Detailsmentioning

confidence: 99%

impact of sample selection on in-context learning for entity extraction from scientific writing

Bölücü,

Rybinski,

Wan

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…We further show that near-duplicate analysis is useful in at least two ways. First, it allows us to inspect and refine a dataset, in a manner similar to measuring data (Wang et al, 2022;Mitchell et al, 2022, inter alia), by identifying phenomena that might otherwise go unnoticed, e.g. texts that are assigned to different classes but have no actual dialectal differences or spotting artefacts due to the selection of text sources or to the processing pipeline (e.g.…”

Section: Introductionmentioning

confidence: 99%

Dialect and Variant Identification as a Multi-Label Classification Task: A Proposal Based on Near-Duplicate Analysis

Bernier-colborne,

Goutte,

Leger

2023

Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

View full text Add to dashboard Cite

We argue that dialect identification should be treated as a multi-label classification problem rather than the single-class setting prevalent in existing collections and evaluations. In order to avoid extensive human re-labelling of the data, we propose an analysis of ambiguous near-duplicates in an existing collection covering four variants of French. We show how this analysis helps us provide multiple labels for a significant subset of the original data, therefore enriching the annotation with minimal human intervention. The resulting data can then be used to train dialect identifiers in a multi-label setting. Experimental results show that on the enriched dataset, the multi-label classifier produces similar accuracy to the single-label classifier on test cases that are unambiguous (single label), but it increases the macro-averaged F1score by 0.225 absolute (71% relative gain) on ambiguous texts with multiple labels. On the original data, gains on the ambiguous test cases are smaller but still considerable (+0.077 absolute, 20% relative gain), and accuracy on nonambiguous test cases is again similar in this case. This supports our thesis that modelling dialect identification as a multi-label problem potentially has a positive impact.

show abstract

Statistical Dataset Evaluation: Reliability, Difficulty, and Validity

Cited by 2 publications

References 0 publications

impact of sample selection on in-context learning for entity extraction from scientific writing

impact of sample selection on in-context learning for entity extraction from scientific writing

Dialect and Variant Identification as a Multi-Label Classification Task: A Proposal Based on Near-Duplicate Analysis

Contact Info

Product

Resources

About