Time series segmentation, a.k.a. multiple change-point detection, is a well-established problem. However, few solutions are designed specifically for high-dimensional situations. In this paper, our interest is in segmenting the second-order structure of a high-dimensional time series. In a generic step of a binary segmentation algorithm for multivariate time series, one natural solution is to combine CUSUM statistics obtained from local periodograms and cross-periodograms of the components of the input time series. However, the standard "maximum" and "average" methods for doing so often fail in high dimensions when, for example, the change-points are sparse across the panel or the CUSUM statistics are spuriously large.In this paper, we propose the Sparsified Binary Segmentation (SBS) algorithm which aggregates the CUSUM statistics by adding only those that pass a certain threshold. This "sparsifying" step reduces the impact of irrelevant, noisy contributions, which is particularly beneficial in high dimensions.In order to show the consistency of SBS, we introduce the multivariate Locally Stationary Wavelet model for time series, which is a separate contribution of this work.
In this paper, we consider the problem of (multiple) change-point detection in panel data. We propose the double CUSUM statistic which utilises the cross-sectional change-point structure by examining the cumulative sums of ordered CUSUMs at each point. The efficiency of the proposed change-point test is studied, which is reflected on the rate at which the cross-sectional size of a change is permitted to converge to zero while it is still detectable. Also, the consistency of the proposed change-point detection procedure based on the binary segmentation algorithm, is established in terms of both the total number and locations (in time) of the estimated change-points. Motivated by the representation properties of the Generalised Dynamic Factor Model, we propose a bootstrap procedure for test criterion selection, which accounts for both cross-sectional and within-series correlations in high-dimensional data. The empirical performance of the double CUSUM statistics, equipped with the proposed bootstrap scheme, is investigated in a comparative simulation study with the state-of-the-art. As an application, we analyse the log returns of S&P 100 component stock prices over a period of one year
In this paper, we propose a fast, well-performing, and consistent method for segmenting a piecewise-stationary, linear time series with an unknown number of breakpoints. The time series model we use is the nonparametric Locally Stationary Wavelet model, in which a complete description of the piecewise-stationary second-order structure is provided by wavelet periodograms computed at multiple scales and locations. The initial stage of our method is a new binary segmentation procedure, with a theoretically justified and rapidly computable test criterion that detects breakpoints in wavelet periodograms separately at each scale. This is followed by within-scale and across-scales post-processing steps, leading to consistent estimation of the number and locations of breakpoints in the second-order structure of the original process. An extensive simulation study demonstrates good performance of our method.keywords: binary segmentation, breakpoint detection, locally stationary wavelet model, piecewise stationarity, post-processing, wavelet periodogram.
We propose the first comprehensive treatment of high-dimensional time series factor models with multiple change-points in their second-order structure. We operate under the most flexible definition of piecewise stationarity, and estimate the number and locations of change-points consistently as well as identifying whether they originate in the common or idiosyncratic components. Through the use of wavelets, we transform the problem of change-point detection in the second-order structure of a high-dimensional time series, into the (relatively easier) problem of change-point detection in the means of high-dimensional panel data. Also, our methodology circumvents the difficult issue of the accurate estimation of the true number of factors in the presence of multiple change-points by adopting a screening procedure. We further show that consistent factor analysis is achieved over each segment defined by the change-points estimated by the proposed methodology. In extensive simulation studies, we observe that factor analysis prior to change-point detection improves the detectability of change-points, and identify and describe an interesting 'spillover' effect in which substantial breaks in the idiosyncratic components get, naturally enough, identified as change-points in the common components, which prompts us to regard the corresponding change-points as also acting as a form of 'factors'. Our methodology is implemented in the R package factorcpt, available from CRAN.
This paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly spurious) high correlations among the variables, which result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response which takes into account high correlations among the variables in a data-driven way. The proposed tilting procedure provides an adaptive choice between the use of marginal correlation and tilted correlation for each variable, where the choice is made depending on the values of the hard-thresholded sample correlation of the design matrix. We study the conditions under which this measure can successfully discriminate between the relevant and the irrelevant variables and thus be used as a tool for variable selection. Finally, an iterative variable screening algorithm is constructed to exploit the theoretical properties of tilted correlation, and its good practical performance is demonstrated in a comparative simulation study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.