Amir-massoud Farahmand scite author profile

Intuitively, learning should be easier when the data points lie on a low-dimensional submanifold of the input space. Recently there has been a growing interest in algorithms that aim to exploit such geometrical properties of the data. Oftentimes these algorithms require estimating the dimension of the manifold first. In this paper we propose an algorithm for dimension estimation and study its finite-sample behaviour. The algorithm estimates the dimension locally around the data points using nearest neighbor techniques and then combines these local estimates. We show that the rate of convergence of the resulting estimate is independent of the dimension of the input space and hence the algorithm is "manifold-adaptive". Thus, when the manifold supporting the data is low dimensional, the algorithm can be exponentially more efficient than its counterparts that are not exploiting this property. Our computer experiments confirm the obtained theoretical results.

show abstract

Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems

Farahmand

Ghavamzadeh

Szepesvári

et al. 2009

View full text Add to dashboard Cite

Reinforcement learning with linear and non-linear function approximation has been studied extensively in the last decade. However, as opposed to other fields of machine learning such as supervised learning, the effect of finite sample has not been thoroughly addressed within the reinforcement learning framework. In this paper we propose to use regularization in reinforcement learning and planning. More specifically, we control the complexity of the value function approximation using L 2 regularization. We consider the fitted Q-iteration algorithm, provide generalization bounds that account for small sample sizes. A realistic visual-servoing problem is used to illustrate the benefits of using a regularized procedure.

show abstract

Model selection in reinforcement learning

Farahmand

Szepesvári

2011

Mach Learn

View full text Add to dashboard Cite

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, BERMIN, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions BERMIN leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.

show abstract

Global visual-motor estimation for uncalibrated visual servoing

Farahmand

Shademan

Jägersand

2007

View full text Add to dashboard Cite

Deep reinforcement learning for partial differential equation control

Farahmand

Nabi

Nikovski

2017

View full text Add to dashboard Cite

Hill Climbing on Value Estimates for Search-control in Dyna

Pan

Yao

Farahmand

et al. 2019

View full text Add to dashboard Cite

Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high-value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy projected natural gradient algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC-Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search-control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low-value to high-value region. 1 We use DQN to refer to the algorithm by [Mnih et al., 2015] that uses ER and target network, but not the exact original architecture.

show abstract

Robust Jacobian estimation for uncalibrated visual servoing

Shademan

Farahmand

Jägersand

2010

View full text Add to dashboard Cite

This paper addresses robust estimation of the uncalibrated visual-motor Jacobian for an image-based visual servoing (IBVS) system. The proposed method does not require knowledge of model or system parameters and is robust to outliers caused by various visual tracking errors, such as occlusion or mis-tracking. Previous uncalibrated methods are not robust to outliers and assume that the visual-motor data belong to the underlying model. In unstructured environments, this assumption may not hold. Outliers to the visual-motor model may deteriorate the Jacobian, which can make the system unstable or drive the arm in the wrong direction. We propose to apply a statistically robust M-estimator to reject the outliers. We compare the quality of the robust Jacobian estimation with the least squares-based estimation. The effect of outliers on the estimation quality is studied through MATLAB simulations and eye-in-hand visual servoing experiments using a WAM arm. Experimental results show that the Jacobian estimated by robust M-estimation is robust when up to 40% of the visualmotor data are outliers.

show abstract

Regularized Fitted Q-Iteration: Application to Planning

Farahmand

Ghavamzadeh

Szepesvári

et al. 2008

View full text Add to dashboard Cite

Abstract. We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducingkernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.