This paper studies a Dantzig-selector type regularized estimator for linear functionals of highdimensional linear processes. Explicit rates of convergence of the proposed estimator are obtained and they cover the broad regime from i.i.d. samples to long-range dependent time series and from subGaussian innovations to those with mild polynomial moments. It is shown that the convergence rates depend on the degree of temporal dependence and the moment conditions of the underlying linear processes. The Dantzig-selector estimator is applied to the sparse Markowitz portfolio allocation and the optimal linear prediction for time series, in which the ratio consistency when compared with an oracle estimator is established. The effect of dependence and innovation moment conditions is further illustrated in the simulation study. Finally, the regularized estimator is applied to classify the cognitive states on a real fMRI dataset and to portfolio optimization on a financial dataset.
The k-nearest neighbor (kNN) algorithm is a classic supervised machine learning algorithm. It is widely used in cyber-physical-social systems (CPSS) to analyze and mine data. However, in practical CPSS applications, the standard linear kNN algorithm struggles to efficiently process massive data sets. This paper proposes a distributed storage and computation k-nearest neighbor (D-kNN) algorithm. The D-kNN algorithm has the following advantages: First, the concept of k-nearest neighbor boundaries is proposed and the k-nearest neighbor search within the k-nearest neighbors boundaries can effectively reduce the time complexity of kNN. Second, based on the k-neighbor boundary, massive data sets beyond the main storage space are stored on distributed storage nodes. Third, the algorithm performs k-nearest neighbor searching efficiently by performing distributed calculations at each storage node. Finally, a series of experiments were performed to verify the effectiveness of the D-kNN algorithm. The experimental results show that the D-kNN algorithm based on distributed storage and calculation effectively improves the operation efficiency of k-nearest neighbor search. The algorithm can be easily and flexibly deployed in a cloud-edge computing environment to process massive data sets in CPSS. INDEX TERMS kNN, k-nearest neighbor boundary, distributed storage and computation, cloud-edge computing, CPSS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.