2015
DOI: 10.1126/science.aaa9375
|View full text |Cite
|
Sign up to set email alerts
|

The reusable holdout: Preserving validity in adaptive data analysis

Abstract: Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

2
244
0
1

Year Published

2016
2016
2021
2021

Publication Types

Select...
7
1
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 267 publications
(247 citation statements)
references
References 18 publications
(18 reference statements)
2
244
0
1
Order By: Relevance
“…Nevertheless, we point out that to confirm the stability of our model and in order to ensure the reproducibility of the data, it will be needed to extend the number of observations included in the training set and to collect a further sample of naïve participants to test the model with an out-ofsample procedure [28].…”
Section: Discussionmentioning
confidence: 99%
“…Nevertheless, we point out that to confirm the stability of our model and in order to ensure the reproducibility of the data, it will be needed to extend the number of observations included in the training set and to collect a further sample of naïve participants to test the model with an out-ofsample procedure [28].…”
Section: Discussionmentioning
confidence: 99%
“…Organizers need to find ways to prevent data leakage and overfitting 83 , such as by limiting the number of submissions to the leaderboard or limiting the information revealed by the leaderboard 84 .…”
mentioning
confidence: 99%
“…They identified stability as the key necessary and sufficient condition for learnability. Recently, algorithmic stability has been also connected to differential privacy40, (robust and perfect) generalization6, adaptive data analysis41, adaptive learning and compression schemes42.…”
mentioning
confidence: 99%