We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method-coupled with a novel, non-standard matrix estimation error metric-to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.
We develop a method to help quantify the impact that different levels of mobility restrictions have had on COVID-19 related deaths across various countries. Synthetic control (SC), regarded as the "most important innovation in the policy evaluation in the last 15 years" (8), has emerged as a standard tool to produce counterfactual estimates if a particular intervention had not occurred, using just observational data. However, extending SC to obtain counterfactual estimates if a particular intervention had occurred remains an important open problem (4) -this is precisely the question that arises when assessing the impacts of varying mobility restrictions as stated above. As the main contribution of this work, we introduce synthetic interventions (SI), which helps resolve this open problem by providing counterfactual estimates for multiple interventions of interest. We introduce a tensor factor model, a natural generalization of matrix factor models used to analyze SC, and prove that SI produces consistent counterfactual estimates under this setting. Our finite sample analyses show the test (out-of-sample) error decays as 1/T 0 , where T 0 is the amount of observed pre-intervention (training) data. As a special case of our result, this improves upon the 1/ √ T 0 bound on the test error for SC in prior works. We prove that our test error bound holds under a certain "subspace inclusion" condition, and furnish a data-driven hypothesis test, with provable guarantees, to check for this condition. Again, as a special case, this provides a quantitative hypothesis test for the validity of when to apply SC, which has been absent in the literature. As a technical contribution, we establish that both the parameter estimation and test error for Principal Component Regression (a key subroutine of SI and several SC variants) decay as 1/T 0 under the high-dimensional error-in-variable regression setting; this improves upon the best prior test error bound of 1/ √ T 0 . In addition to the COVID-19 case study, we show how SI can be used to perform dataefficient, personalized randomized control trials (or A/B tests) using real-data from a large e-commerce website and large developmental economics study, thereby establishing its widespread applicability.
We analyze the classical method of Principal Component Regression (PCR) in the high-dimensional error-in-variables setting. Here, the observed covariates are not only noisy and contain missing data, but the number of covariates can also exceed the sample size. Under suitable conditions, we establish that PCR identifies the unique model parameter with minimum 2norm, and derive non-asymptotic 2 -rates of convergence that show its consistency. We further provide non-asymptotic out-of-sample prediction performance guarantees that again prove consistency, even in the presence of corrupted unseen data. Notably, our results do not require the out-of-samples covariates to follow the same distribution as that of the in-sample covariates, but rather that they obey a simple linear algebraic constraint. We finish by presenting simulations that illustrate our theoretical results.
Managing the outbreak of COVID-19 in India constitutes an unprecedented health emergency in one of the largest and most diverse nations in the world. On May 4, 2020, India started the process of releasing its population from a national lockdown during which extreme social distancing was implemented. We describe and simulate an adaptive control approach to exit this situation, while maintaining the epidemic under control. Adaptive control is a flexible countercyclical policy approach, whereby different areas release from lockdown in potentially different gradual ways, dependent on the local progression of the disease. Because of these features, adaptive control requires the ability to decrease or increase social distancing in response to observed and projected dynamics of the disease outbreak. We show via simulation of a stochastic Susceptible-Infected-Recovered (SIR) model and of a synthetic intervention (SI) model that adaptive control performs at least as well as immediate and full release from lockdown starting May 4 and as full release from lockdown after a month (i.e., after May 31). The key insight is that adaptive response provides the option to increase or decrease socioeconomic activity depending on how it affects disease progression and this freedom allows it to do at least as well as most other policy alternatives. We also discuss the central challenge to any nuanced release policy, including adaptive control, specifically learning how specific policies translate into changes in contact rates and thus COVID-19's reproductive rate in real time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.