Experimental Standards for Deep Learning in Natural Language Processing Research

Ulmer, Dennis; Bassignana, Elisa; Müller-Eberstein, Max; Varab, Daniel; Mike, Zhang,; Hardmeier, Christian; Plank, Barbara

doi:10.48550/arxiv.2204.06251

Cited by 2 publications

(3 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our opinion, there are some fundamental steps that need to be made by researchers in order to improve experimental analysis and, ultimately, make progress in science. In this sense, we share the ideas discussed by [36], and we believe that there are some necessary best practices regarding choice of data, source code, models, experimental setting, and analysis that should be documented in any research paper that presents experimental results.…”

Section: How Does This Study Reflect On Current Knowledge About Issue...mentioning

confidence: 80%

“…We believe that only a better documentation, both in the research paper and in the source code, could overcome most (if not all) the issues we encountered in this reproducibility study. The best practices suggested by [36] are one of the best starting points for a checklist of all the things a researcher should take into account "before" any experimental analysis. In order to mitigate issues like the ones related to the stopping criterion (an approach that is described in the paper but that is missing in the code), we believe that only a more accurate check on all the steps is the solution.…”

Section: What Are the Problems And Challenges Encountered?mentioning

confidence: 99%

See 1 more Smart Citation

A Thorough Reproducibility Study on Sentiment Classification: Methodology, Experimental Setting, Results

Nunzio

Minzoni

2023

Information

View full text Add to dashboard Cite

A survey published by Nature in 2016 revealed that more than 70% of researchers failed in their attempt to reproduce another researcher’s experiments, and over 50% failed to reproduce one of their own experiments; a state of affairs that has been termed the ‘reproducibility crisis’ in science. The purpose of this work is to contribute to the field by presenting a reproducibility study of a Natural Language Processing paper about “Language Representation Models for Fine-Grained Sentiment Classification”. A thorough analysis of the methodology, experimental setting, and experimental results are presented, leading to a discussion of the issues and the necessary steps involved in this kind of study.

show abstract

Section: How Does This Study Reflect On Current Knowledge About Issue...mentioning

confidence: 80%

Section: What Are the Problems And Challenges Encountered?mentioning

confidence: 99%

A Thorough Reproducibility Study on Sentiment Classification: Methodology, Experimental Setting, Results

Nunzio

Minzoni

2023

Information

View full text Add to dashboard Cite

show abstract

“…Weights & Biases (Biewald, 2020) was used to track and manage hyperparameter searches and experiments. In general, we follow many of the experimental guidelines and suggestions laid out by Ulmer et al (2022a).…”

Section: B Calibration Metricsmentioning

confidence: 99%

Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Ulmer¹,

Frellsen²,

Hardmeier³

2022

Findings of the Association for Computational Linguistics: EMNLP 2022

View full text Add to dashboard Cite

We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are opensourced in a software package. 1 The model zoo is available under https://github.com/ Kaleidophon/nlp-uncertainty-zoo, with the code for the experiments available under https://github.com/ Kaleidophon/nlp-low-resource-uncertainty.2 That is, unless the model class we chose is too restrictive.

show abstract

Experimental Standards for Deep Learning in Natural Language Processing Research

Cited by 2 publications

References 55 publications

A Thorough Reproducibility Study on Sentiment Classification: Methodology, Experimental Setting, Results

A Thorough Reproducibility Study on Sentiment Classification: Methodology, Experimental Setting, Results

Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Contact Info

Product

Resources

About