László Györfi scite author profile

We study the problem of lossless feature selection for a d-dimensional feature vector X = (X (1) , . . . , X (d) ) and label Y for binary classification as well as nonparametric regression. For an index set S ⊂ {1, . . . , d}, consider the selected |S|-dimensional feature subvector X S = (X (i) , i ∈ S). If L * and L * (S) stand for the minimum risk based on X and X S , respectively, then X S is called lossless if L * = L * (S). For classification, the minimum risk is the Bayes error probability, while in regression, the minimum risk is the residual variance. We introduce nearest-neighbor based test statistics to test the hypothesis that X S is lossless. For the threshold a n = log n/ √ n, the corresponding tests are proved to be consistent under conditions on the distribution of (X, Y ) that are significantly milder than in previous work. Also, our threshold is dimension-independent, in contrast to earlier methods where for large d the threshold becomes too large to be useful in practice.

show abstract

On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates

Devroye¹,

Györfi²,

Krzyżak³

et al. 1994

Ann. Statist.

265

148

View full text Add to dashboard Cite

Distribution estimation consistent in total variation and in two types of information divergence

Barron

Györfi

Meulen

1992

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

The problem of the nonparametric estimation of a probability distribution is considered from three viewpoints: the consistency in total variation, the consistency in information divergence, and consistency in reversed order information divergence. These types of consistencies are relatively strong criteria of convergence, and a probability distribution cannot he consistently estimated in either type of convergence without any restrictions on the class of probability distributions allowed. Histogram-based estimators of distribution are presented which, under certain conditions, converge in total variation, in information divergence, and in reversed order information divergence to the unknown probability distribution. Some a priori information about the true probability distribution is assumed in each case. As the concept of consistency in information divergence is stronger than that of convergence in total variation, additional assumptions are imposed in the cases of informational divergences.

show abstract

Nonparametric Kernel‐based Sequential Investment Strategies

Györfi

Lugosi

Udina

2006

Mathematical Finance

110

View full text Add to dashboard Cite

The purpose of this paper is to introduce sequential investment strategies that guarantee an optimal rate of growth of the capital, under minimal assumptions on the behavior of the market. The new strategies are analyzed both theoretically and empirically. The theoretical results show that the asymptotic rate of growth matches the optimal one that one could achieve with a full knowledge of the statistical properties of the underlying process generating the market, under the only assumption that the market is stationary and ergodic. The empirical results show that the performance of the proposed investment strategies measured on past NYSE and currency exchange data is solid, and sometimes even spectacular.

show abstract

Nonparametric nearest neighbor based empirical portfolio selection strategies

Györfi¹,

Udina

Walk

2008

View full text Add to dashboard Cite

Summary:In recent years optimal portfolio selection strategies for sequential investment have been shown to exist. Although their asymptotical optimality is well established, finite sample properties do need the adjustment of parameters that depend on dimensionality and scale. In this paper we introduce some nearest neighbor based portfolio selectors that solve these problems, and we show that they are also log-optimal for the very general class of stationary and ergodic random processes. The newly proposed algorithm shows very good finite-horizon performance when applied to different markets with different dimensionality or scales without any change: we see it as a very robust strategy.

show abstract

Nonparametric inference for ergodic, stationary time series

1996

View full text Add to dashboard Cite

The setting is a stationary, ergodic time series. The challenge is to construct a sequence of functions, each based on only finite segments of the past, which together provide a strongly consistent estimator for the conditional probability of the next observation, given the infinite past. Ornstein gave such a construction for the case that the values are from a finite set, and recently Algoet extended the scheme to time series with coordinates in a Polish space.The present study relates a different solution to the challenge. The algorithm is simple and its verification is fairly transparent. Some extensions to regression, pattern recognition, and on-line forecasting are mentioned.

show abstract

The Hilbert Kernel Regression Estimate

Devroye

Györfi

Krzyżak

1998

Journal of Multivariate Analysis

View full text Add to dashboard Cite

Let (X, Y ) be an R d _R-valued regression pair, where X has a density and Y is bounded. If n i.i.d. samples are drawn from this distribution, the Nadaraya Watson kernel regression estimate in R d with Hilbert kernel K(x)=1Â&x& d is shown to converge weakly for all such regression pairs. We also show that strong convergence cannot be obtained. This is particularly interesting as this regression estimate does not have a smoothing parameter. Academic PressAMS 1991 subject classifications: Primary 62G05.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.