Dirk Ormoneit scite author profile

Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporal-difference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algorithms are frequently unstable. In this work, we present a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution. By contrast to existing algorithms, our method can also be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically. Our focus is on learning in an average-cost framework and on a practical application to the optimal portfolio choice problem.

show abstract

Hoeffding's inequality for uniformly ergodic Markov chains

Glynn

Ormoneit

2002

Statistics & Probability Letters

110

101

View full text Add to dashboard Cite

Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates

Ormoneit

Tresp

1998

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

We apply the idea of averaging ensembles of estimators to probability density estimation. In particular, we use Gaussian mixture models which are important components in many neural-network applications. We investigate the performance of averaging using three data sets. For comparison, we employ two traditional regularization approaches, i.e., a maximum penalized likelihood approach and a Bayesian approach. In the maximum penalized likelihood approach we use penalty functions derived from conjugate Bayesian priors such that an expectation maximization (EM) algorithm can be used for training. In all experiments, the maximum penalized likelihood approach and averaging improved performance considerably if compared to a maximum likelihood approach. In two of the experiments, the maximum penalized likelihood approach outperformed averaging. In one experiment averaging was clearly superior. Our conclusion is that maximum penalized likelihood gives good results if the penalty term in the cost function is appropriate for the particular problem. If this is not the case, averaging is superior since it shows greater robustness by not relying on any particular prior assumption. The Bayesian approach worked very well on a low-dimensional toy problem but failed to give good performance in higher dimensional problems.

show abstract

Representing cyclic human motion using functional analysis

Ormoneit¹,

Black

Hastie

et al. 2005

Image and Vision Computing

View full text Add to dashboard Cite

We present a robust automatic method for modeling cyclic 3D human motion such as walking using motion-capture data. The pose of the body is represented by a time series of joint angles which are automatically segmented into a sequence of motion cycles. The mean and the principal components of these cycles are computed using a new algorithm that enforces smooth transitions between the cycles by operating in the Fourier domain. Key to this method is its ability to automatically deal with noise and missing data. A learned walking model is then exploited for Bayesian tracking of 3D human motion.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dirk Ormoneit

Kernel-based reinforcement learning in average-cost problems

Hoeffding's inequality for uniformly ergodic Markov chains

Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates

Representing cyclic human motion using functional analysis

Contact Info

Product

Resources

About