Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top K eigenvectors and transmits them to the central server; the central server then aggregates the information from all the node machines and conducts a PCA based on the aggregated information. We investigate the bias and variance for the resulting distributed estimator of the top K eigenvectors. In particular, we show that for distributions with symmetric innovation, the empirical top eigenspaces are unbiased and hence the distributed PCA is “unbiased”. We derive the rate of convergence for distributed PCA estimators, which depends explicitly on the effective rank of covariance, eigen-gap, and the number of machines. We show that when the number of machines is not unreasonably large, the distributed PCA performs as well as the whole sample PCA, even without full access of whole data. The theoretical results are verified by an extensive simulation study. We also extend our analysis to the heterogeneous case where the population covariance matrices are different across local machines but share similar top eigen-structures.
This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.
IMPORTANCE While telehealth use in surgery has shown to be feasible, telehealth became a major modality of health care delivery during the COVID-19 pandemic.OBJECTIVE To assess patterns of telehealth use across surgical specialties before and during the COVID-19 pandemic.
This paper introduces a simple principle for robust statistical inference via appropriate shrinkage on the data. This widens the scope of high-dimensional techniques, reducing the distributional conditions from sub-exponential or sub-Gaussian to more relaxed bounded second or fourth moment. As an illustration of this principle, we focus on robust estimation of the low-rank matrix Θ* from the trace regression model Y = Tr(Θ* ⊤ X) + ϵ. It encompasses four popular problems: sparse linear model, compressed sensing, matrix completion and multi-task learning. We propose to apply the penalized least-squares approach to the appropriately truncated or shrunk data. Under only bounded 2+δ moment condition on the response, the proposed robust methodology yields an estimator that possesses the same statistical error rates as previous literature with sub-Gaussian errors. For sparse linear model and multi-task regression, we further allow the design to have only bounded fourth moment and obtain the same statistical rates. As a byproduct, we give a robust covariance estimator with concentration inequality and optimal rate of convergence in terms of the spectral norm, when the samples only bear bounded fourth moment. This result is of its own interest and importance. We reveal that under high dimensions, the sample covariance matrix is not optimal whereas our proposed robust covariance can achieve optimality. Extensive simulations are carried out to support the theories.
This paper introduces a simple principle for robust high-dimensional statistical inference via an appropriate shrinkage on the data. This widens the scope of highdimensional techniques, reducing the moment conditions from sub-exponential or sub-Gaussian distributions to merely bounded second or fourth moment. As an illustration of this principle, we focus on robust estimation of the low-rank matrix Θ * from the trace regression model Y = Tr(Θ * T X) + . It encompasses four popular problems: sparse linear models, compressed sensing, matrix completion and multi-task regression. We propose to apply penalized least-squares approach to appropriately truncated or shrunk data. Under only bounded 2 + δ moment condition on the response, the proposed robust methodology yields an estimator that possesses the same statistical error rates as previous literature with sub-Gaussian errors. For sparse linear models and multitasking regression, we further allow the design to have only bounded fourth moment and obtain the same statistical rates, again, by appropriate shrinkage of the design matrix. As a byproduct, we give a robust covariance matrix estimator and establish its concentration inequality in terms of the spectral norm when the random samples have only bounded fourth moment. Extensive simulations have been carried out to support our theories.
Key Points Question What is the association between a primary care practice’s degree of telehealth use and acute care visits for ambulatory care–sensitive conditions during the COVID-19 pandemic? Findings In this cohort study with a difference-in-differences analysis of insurance claims data from 4038 primary care practices, high primary care telehealth use was associated with 2.10 more emergency department visits or hospitalizations for ambulatory care–sensitive conditions per 1000 patients per year compared with the practices with the least telehealth use. Meaning These findings suggest that high levels of primary care practice telehealth use may result in slightly higher acute care visits for ambulatory care–sensitive conditions.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.