Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems 2020
DOI: 10.18653/v1/2020.eval4nlp-1.15
|View full text |Cite
|
Sign up to set email alerts
|

ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation

Abstract: This paper adds to the ongoing discussion in the natural language processing community on how to choose a good development set. Motivated by the real-life necessity of applying machine learning models to different data distributions, we propose a clustering-based data splitting algorithm. It creates development (or test) sets which are lexically different from the training data while ensuring similar label distributions. Hence, we are able to create challenging cross-validation evaluation setups while abstract… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 19 publications
(19 reference statements)
0
3
0
Order By: Relevance
“…Further, there is a line of work now questioning traditional train-dev splits [6] as well as random splits [16]. More challenging datasplits can be created by clustering the documents based on their similarity, where each split encodes unique information to a certain degree [18]. We use this method to train ensembles of models on these splits in a cross-validation format, such that each model has observed slightly different training instances.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Further, there is a line of work now questioning traditional train-dev splits [6] as well as random splits [16]. More challenging datasplits can be created by clustering the documents based on their similarity, where each split encodes unique information to a certain degree [18]. We use this method to train ensembles of models on these splits in a cross-validation format, such that each model has observed slightly different training instances.…”
Section: Related Workmentioning
confidence: 99%
“…Then, we train the model using only the train-fraction of all the data and use the held-out validation data to determine the best model, which is then used to annotate the test data. As an alternative to random splits, we follow [18] and create strategic datasplits by clustering the documents according to their similarity. This creates more challenging splits, as more distant documents are left out for validation.…”
Section: Strategic Datasplitsmentioning
confidence: 99%
See 1 more Smart Citation