Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1487
|View full text |Cite
|
Sign up to set email alerts
|

Uncertainty-aware generative models for inferring document class prevalence

Abstract: Prevalence estimation is the task of inferring the relative frequency of classes of unlabeled examples in a group-for example, the proportion of a document collection with positive sentiment. Previous work has focused on aggregating and adjusting discriminative individual classifiers to obtain prevalence point estimates. But imperfect classifier accuracy ought to be reflected in uncertainty over the predicted prevalence for scientifically valid inference. In this work, we present (1) a generative probabilistic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
13
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(13 citation statements)
references
References 30 publications
0
13
0
Order By: Relevance
“…It is worth emphasizing that here we focus on estimating the prevalence using only human labels and assume that we do not have access to the whole unlabeled population. This is in contrast to the body of research on prevalence estimation [6,22], also known as quantification [3,[13][14][15]29] or class prior estimation [34,38], which use supervised learning to train a classifier and make predictions on unlabeled data to infer the prevalence in the population.…”
Section: Prevalence Measurementmentioning
confidence: 99%
“…It is worth emphasizing that here we focus on estimating the prevalence using only human labels and assume that we do not have access to the whole unlabeled population. This is in contrast to the body of research on prevalence estimation [6,22], also known as quantification [3,[13][14][15]29] or class prior estimation [34,38], which use supervised learning to train a classifier and make predictions on unlabeled data to infer the prevalence in the population.…”
Section: Prevalence Measurementmentioning
confidence: 99%
“…Post-prediction inference appears across fields and has been recognized as a potential source of error in recent work on prevalence estimation (see for example [20] and [21] in the context of data set shift and [22] in document class prevalence estimation). Here, we focus on developing analytical and bootstrap-based approaches to correct regression estimates, standard errors, and test statistics in inferential regression models using predicted outcomes.…”
Section: Introductionmentioning
confidence: 99%
“…Hopkins and King (2010) routinely provided confidence intervals for their estimates "via standard bootstrapping procedures", without commenting much on details of the procedures or on any issues encountered with them. Keith and O'Connor (2018) proposed and compared a number of methods for constructing such confidence intervals. Some of these methods involve Monte-Carlo simulation and some do not.…”
Section: Introductionmentioning
confidence: 99%
“…• Would it be worthwhile to distinguish confidence and prediction intervals for class prevalences and deploy different methods for their estimation? This question is raised against the backdrop that for instance Keith and O'Connor (2018) talked about estimating confidence intervals but in fact constructed prediction intervals which are conceptionally different (Meeker et al, 2017).…”
Section: Introductionmentioning
confidence: 99%