2019
DOI: 10.1080/10618600.2019.1660180
|View full text |Cite
|
Sign up to set email alerts
|

Valid Inference Corrected for Outlier Removal

Abstract: Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard "detect-and-forget" approach has been shown to be problematic, and in this paper we highlight the fact that it can lead to invalid inference and show how recently developed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 30 publications
(28 citation statements)
references
References 35 publications
0
28
0
Order By: Relevance
“…where F ( • ; 0, ν 2 σ 2 , S) denotes the cumulative distribution function of the N (0, ν 2 2 σ 2 ) distribution truncated to the set S. In Section 4, we provide an efficient approach for analytically characterizing the truncation set S λ sib (ν sib ). To avoid numerical issues associated with the truncated normal, we compute (11) using methods described in Chen and Bien [2020].…”
Section: Inference On a Pair Of Sibling Regionsmentioning
confidence: 99%
“…where F ( • ; 0, ν 2 σ 2 , S) denotes the cumulative distribution function of the N (0, ν 2 2 σ 2 ) distribution truncated to the set S. In Section 4, we provide an efficient approach for analytically characterizing the truncation set S λ sib (ν sib ). To avoid numerical issues associated with the truncated normal, we compute (11) using methods described in Chen and Bien [2020].…”
Section: Inference On a Pair Of Sibling Regionsmentioning
confidence: 99%
“…We prove Theorem 1 in Appendix S1.1. Related results have been used to develop selective inference frameworks for regression (Loftus & Taylor 2015, Yang et al 2016) and outlier detection (Chen & Bien 2020). It follows from Theorem 1 that to compute the p-value defined in (7), it suffices to characterize the one-dimensional set…”
Section: A Test Of No Difference In Means Between Two Clustersmentioning
confidence: 99%
“…When standard methods are applied to the cleaned data, the resulting standard errors do not include the uncertainty from the data-cleaning step, such that the standard errors of the two-step approach are underestimated. For instance, Chen & Bien (2017) show that OLS regression after outlier removal results in confidence intervals that are much too small as they do not possess the nominal coverage. Consequently, the p-values from significance tests are too small and could incorrectly suggest significant results.…”
Section: Robust Statisticsmentioning
confidence: 99%