We study the relationship between the distribution of data, on the one hand, and classifier pcrformance, on the olher, for non-parametric classifiers. It is shown thaI predictahle factors such as lhe available amount of training data (relative to the dimensionality of the feature space), the spatial variability of the elTective average distance hetween data samples, and the type and aillount of noise in the data set inlluence such classifiers 10 a significant degree. The methods developed here can be used to gain a detailed understanding of classilier design ancl selection.
Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum-likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.