Right Singular Vector Projection Graphs: Fast High Dimensional Covariance Matrix Estimation under Latent Confounding

Shah, Rajen D.; Frot, Benjamin; Thanei, Gian-Andrea; Meinshausen, Nicolai

doi:10.1111/rssb.12359

Cited by 16 publications

(14 citation statements)

References 47 publications

(123 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We used the 'muscle-skeletal' dataset from the GTEx project 2 , which has 491 rows and 14 713 columns, as our uncorrupted dataset. We preprocessed the data as in Shah et al [2020] by regressing out measured and estimated confounders. We took as a response variable a column randomly selected (anew in each run) from the matrix, meaning that for our experiment n = 491 and p = 14 712.…”

Section: Corrupted Datamentioning

confidence: 99%

High-dimensional regression with potential prior information on variable importance

Stokell¹,

Shah²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

There are a variety of settings where vague prior information may be available on the importance of predictors in high-dimensional regression settings. Examples include ordering on the variables offered by their empirical variances (which is typically discarded through standardisation), the lag of predictors when fitting autoregressive models in time series settings, or the level of missingness of the variables. Whilst such orderings may not match the true importance of variables, we argue that there is little to be lost, and potentially much to be gained, by using them. We propose a simple scheme involving fitting a sequence of models indicated by the ordering. We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression, and describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models. We propose to select a final estimator by crossvalidation and provide a general result on the quality of the best performing estimator on a test set selected from among a number M of competing estimators in a high-dimensional linear regression setting. Our result requires no sparsity assumptions and shows that only a log M price is incurred compared to the unknown best estimator. We demonstrate the effectiveness of our approach when applied to missing or corrupted data, and time series settings. An R package is available on github.

show abstract

Section: Corrupted Datamentioning

confidence: 99%

High-dimensional regression with potential prior information on variable importance

Stokell¹,

Shah²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The variables A would encode the first few principal components of X, and hence A would not be exogenous: then OLS+A (γ = 0) is adjusting for these first principal components and aims to remove hidden confounding bias in estimating the causal parameter β. This common practice in applied statistics (e.g., Novembre et al, 2008) can be justified under the assumption of "dense confounding" where the hidden variables H affect most of the X components ( Ćevid, Bühlmann and Meinshausen, 2018); see also closely related work by, for example, Chandrasekaran, Parrilo and Willsky (2012), Shah et al (2018), Guo, Ćevid and Bühlmann (2020). The theoretical and methodological arguments are different since A is now a proxy for the hidden latent confounder H , very different from a valid instrument and not exogenous.…”

Section: Choosing the Amount Of Causal Regularization Amplification Bias And Specification Of Anchorsmentioning

confidence: 99%

Rejoinder: Invariance, Causality and Robustness

Bühlmann¹

2020

Statist. Sci.

View full text Add to dashboard Cite

show abstract

“…We mention here that Shah et al (2020) provide vaguely related results on robustness for the GTEx data for another Ridge-type procedure for undirected graphical models.…”

Section: An Illustration On Data From the Gtex Consortiummentioning

confidence: 99%

Deconfounding and Causal Regularisation for Stability and External Validity

Bühlmann

Ćevid

2020

Int Statistical Rev

View full text Add to dashboard Cite

We review some recent works on removing hidden confounding and causal regularisation from a unified viewpoint. We describe how simple and user-friendly techniques improve stability, replicability and distributional robustness in heterogeneous data. In this sense, we provide additional thoughts on the issue of concept drift, raised recently by Efron, when the data generating distribution is changing.

show abstract

Right Singular Vector Projection Graphs: Fast High Dimensional Covariance Matrix Estimation under Latent Confounding

Cited by 16 publications

References 47 publications

High-dimensional regression with potential prior information on variable importance

High-dimensional regression with potential prior information on variable importance

Rejoinder: Invariance, Causality and Robustness

Deconfounding and Causal Regularisation for Stability and External Validity

Contact Info

Product

Resources

About