Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) 2019
DOI: 10.18653/v1/k19-1087
|View full text |Cite
|
Sign up to set email alerts
|

On Model Stability as a Function of Random Seed

Abstract: In this paper, we focus on quantifying model stability as a function of random seed by investigating the effects of the induced randomness on model performance and the robustness of the model in general. We specifically perform a controlled study on the effect of random seeds on the behaviour of attention, gradientbased and surrogate model based (LIME) interpretations. Our analysis suggests that random seeds can adversely affect the consistency of models resulting in counterfactual interpretations. We propose … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(24 citation statements)
references
References 23 publications
0
24
0
Order By: Relevance
“…Several works have noted that the same architecture can have very different in-distribution generalization across restarts of the same training process Gurevych, 2017, 2018;Madhyastha and Jain, 2019). Most relevantly for our work, finetuning of BERT is unstable for some datasets, such that some runs achieve state-of-the-art results while others perform poorly (Devlin et al, 2019;Phang et al, 2018).…”
Section: In-distribution Generalizationmentioning
confidence: 75%
“…Several works have noted that the same architecture can have very different in-distribution generalization across restarts of the same training process Gurevych, 2017, 2018;Madhyastha and Jain, 2019). Most relevantly for our work, finetuning of BERT is unstable for some datasets, such that some runs achieve state-of-the-art results while others perform poorly (Devlin et al, 2019;Phang et al, 2018).…”
Section: In-distribution Generalizationmentioning
confidence: 75%
“…While this effect is common to all supervised machine learning models, it gets amplified in our case due to the large imbalance and low abundance of annotations for training. With periodic checks during training, a stable model state can be achieved, but further work may attempt to improve model stability by, for example adding regularizers, or incorporating more advanced weighting schemes (Madhyastha & Jain, 2019).…”
Section: Limitations and Future Perspectivesmentioning
confidence: 99%
“…Second, using a categorical feature to denote model types constrains its expressive power for modeling performance. In reality, a slight change in model hyperparameters (Hoos and Leyton-Brown, 2014;Probst et al, 2019), optimization algorithms (Kingma and Ba, 2014), or even random seeds (Madhyastha and Jain, 2019) may give rise to a significant variation in performance, which our predictor is not able to capture. While investigating the systematic implications of model structures or hyperparameters is practically infeasible in this study, we may use additional information such as textual model descriptions for modeling NLP models and training procedures more elaborately in the future.…”
Section: Discussionmentioning
confidence: 99%