2019
DOI: 10.48550/arxiv.1906.12125
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

High-dimensional principal component analysis with heterogeneous missingness

Abstract: We study the problem of high-dimensional Principal Component Analysis (PCA) with missing observations. In simple, homogeneous missingness settings with a noise level of constant order, we show that an existing inverse-probability weighted (IPW) estimator of the leading principal components can (nearly) attain the minimax optimal rate of convergence. However, deeper investigation reveals both that, particularly in more realistic settings where the missingness mechanism is heterogeneous, the empirical performanc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(25 citation statements)
references
References 42 publications
0
23
0
Order By: Relevance
“…This is a generalization of the MAR setting as such an approach circumvents the requirement of meaningful auxiliary features X to conduct propensity score estimation. Additional works within the MNAR literature include Zhu et al (2019); Sportisse et al (2020a,b); Wang et al (2020).…”
Section: Missing Not At Random (Mnar) Mnar Is the Most Challenging Mi...mentioning
confidence: 99%
See 1 more Smart Citation
“…This is a generalization of the MAR setting as such an approach circumvents the requirement of meaningful auxiliary features X to conduct propensity score estimation. Additional works within the MNAR literature include Zhu et al (2019); Sportisse et al (2020a,b); Wang et al (2020).…”
Section: Missing Not At Random (Mnar) Mnar Is the Most Challenging Mi...mentioning
confidence: 99%
“…That is, the entries are missing not at random (MNAR). To address the above challenges, there has been exciting recent progress on matrix completion with MNAR data, including Schnabel et al (2016); Ma and Chen (2019); Zhu et al (2019); Sportisse et al (2020a,b); Wang et al (2020); Yang et al (2021); Bhattacharya and Chatterjee (2021). Through numerous empirical studies, these works have shown that algorithms that account for MNAR data outperform conventional algorithms that are designed for MCAR data.…”
Section: Introductionmentioning
confidence: 99%
“…Inadequacy of prior works. While methods for estimating principal subspace are certainly not in shortage (e.g., Balzano et al (2018); Cai et al (2021); Cai and Zhang (2018); Li et al (2021); Lounici (2014); Zhang et al (2018); Zhu et al (2019)), methods for constructing confidence regions for principal subspace remain vastly under-explored. The fact that the estimators in use for PCA are typically nonlinear and nonconvex presents a substantial challenge in the development of a distributional theory, let alone uncertainty quantification.…”
Section: Problem Formulationmentioning
confidence: 99%
“…Several useful extensions have been developed tailored to high-dimensional statistical applications, particularly when the perturbation matrix of interest enjoys certain random structure , O'Rourke et al, 2018, Vu, 2011, Wang, 2015, Xia, 2019, Yu et al, 2015. In particular, the ℓ 2 perturbation bounds for the eigenvector (or eigenspace) of the sample covariance matrix has been extensively studied in the PCA literature, e.g., [Johnstone and Lu, 2009, Lounici, 2013, 2014, Nadler, 2008, Zhu et al, 2019. Another line of works [O'Rourke et al, 2018, Vu, 2011 improved Davis-Kahan's and Wedin's theorems in the matrix denoising setting with small eigengaps, which, however, is not tight unless the spectral norm H of the noise matrix is extremely small.…”
Section: Related Workmentioning
confidence: 99%