We study the problem of lossless feature selection for a d-dimensional feature vector X = (X (1) , . . . , X (d) ) and label Y for binary classification as well as nonparametric regression. For an index set S ⊂ {1, . . . , d}, consider the selected |S|-dimensional feature subvector X S = (X (i) , i ∈ S). If L * and L * (S) stand for the minimum risk based on X and X S , respectively, then X S is called lossless if L * = L * (S). For classification, the minimum risk is the Bayes error probability, while in regression, the minimum risk is the residual variance. We introduce nearest-neighbor based test statistics to test the hypothesis that X S is lossless. For the threshold a n = log n/ √ n, the corresponding tests are proved to be consistent under conditions on the distribution of (X, Y ) that are significantly milder than in previous work. Also, our threshold is dimension-independent, in contrast to earlier methods where for large d the threshold becomes too large to be useful in practice.
The problem of the nonparametric estimation of a probability distribution is considered from three viewpoints: the consistency in total variation, the consistency in information divergence, and consistency in reversed order information divergence. These types of consistencies are relatively strong criteria of convergence, and a probability distribution cannot he consistently estimated in either type of convergence without any restrictions on the class of probability distributions allowed. Histogram-based estimators of distribution are presented which, under certain conditions, converge in total variation, in information divergence, and in reversed order information divergence to the unknown probability distribution. Some a priori information about the true probability distribution is assumed in each case. As the concept of consistency in information divergence is stronger than that of convergence in total variation, additional assumptions are imposed in the cases of informational divergences.
The purpose of this paper is to introduce sequential investment strategies that guarantee an optimal rate of growth of the capital, under minimal assumptions on the behavior of the market. The new strategies are analyzed both theoretically and empirically. The theoretical results show that the asymptotic rate of growth matches the optimal one that one could achieve with a full knowledge of the statistical properties of the underlying process generating the market, under the only assumption that the market is stationary and ergodic. The empirical results show that the performance of the proposed investment strategies measured on past NYSE and currency exchange data is solid, and sometimes even spectacular.
Summary:In recent years optimal portfolio selection strategies for sequential investment have been shown to exist. Although their asymptotical optimality is well established, finite sample properties do need the adjustment of parameters that depend on dimensionality and scale. In this paper we introduce some nearest neighbor based portfolio selectors that solve these problems, and we show that they are also log-optimal for the very general class of stationary and ergodic random processes. The newly proposed algorithm shows very good finite-horizon performance when applied to different markets with different dimensionality or scales without any change: we see it as a very robust strategy.
The setting is a stationary, ergodic time series. The challenge is to construct a sequence of functions, each based on only finite segments of the past, which together provide a strongly consistent estimator for the conditional probability of the next observation, given the infinite past. Ornstein gave such a construction for the case that the values are from a finite set, and recently Algoet extended the scheme to time series with coordinates in a Polish space.The present study relates a different solution to the challenge. The algorithm is simple and its verification is fairly transparent. Some extensions to regression, pattern recognition, and on-line forecasting are mentioned.
Let (X, Y ) be an R d _R-valued regression pair, where X has a density and Y is bounded. If n i.i.d. samples are drawn from this distribution, the Nadaraya Watson kernel regression estimate in R d with Hilbert kernel K(x)=1Â&x& d is shown to converge weakly for all such regression pairs. We also show that strong convergence cannot be obtained. This is particularly interesting as this regression estimate does not have a smoothing parameter.
Academic PressAMS 1991 subject classifications: Primary 62G05.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.