Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data 2020
DOI: 10.1145/3318464.3380604
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Validate the Predictions of Black Box Classifiers on Unseen Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(11 citation statements)
references
References 13 publications
0
11
0
Order By: Relevance
“…We here aim for realistic modeling of these missingness patterns inspired by observations in large-scale real-world datasets as investigated in the work of Biessmann et al (2018) . We use an implementation proposed in the work of Schelter et al (2020) and Schelter et al (2021) , which selects two random percentiles of the values in a column, one for the lower and the other for the upper bound of the value range considered. In the MAR condition, we discard values if values in a random other column fall in that percentile.…”
Section: Methodsmentioning
confidence: 99%
“…We here aim for realistic modeling of these missingness patterns inspired by observations in large-scale real-world datasets as investigated in the work of Biessmann et al (2018) . We use an implementation proposed in the work of Schelter et al (2020) and Schelter et al (2021) , which selects two random percentiles of the values in a column, one for the lower and the other for the upper bound of the value range considered. In the MAR condition, we discard values if values in a random other column fall in that percentile.…”
Section: Methodsmentioning
confidence: 99%
“…More recently, Gopakumar et al (2018) suggested to search for the worst case model performance using limited labeled data, however, we posit that using worst case to assess the goodness of a model-under-test is an overkill because the worst case is often just an outlier. The work in Schelter et al (2020) learns to validate the model without labeled data by generating a synthetic dataset representative of the deployment data. The restrictive assumption is that it requires domain experts to provide a set of data generators, a task usually infeasible in reality.…”
Section: A Related Workmentioning
confidence: 99%
“…Further, it does not differentiate between different types of uncertainty. Schelter et al [32] proposed a model-agnostic validation approach to detect data-related errors at serving time. However, this work focuses on errors arising from dataprocessing issues, such as missing values or incorrectly entered values, and relies on programmatic specification of typical data errors.…”
Section: Related Workmentioning
confidence: 99%