Miguel Calvo-Fullana scite author profile

In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. We therefore consider a constrained MDP where the constraints are probabilistic. Since there is no straightforward way to optimize the policy with respect to the probabilistic constraint in a reinforcement learning framework, we propose an ergodic relaxation of the problem. The advantages of the proposed relaxation are threefold. (i) The safety guarantees are maintained in the case of episodic tasks and they are kept up to a given time horizon for continuing tasks. (ii) The constrained optimization problem despite its non-convexity has arbitrarily small duality gap if the parametrization of the policy is rich enough. (iii) The gradients of the Lagrangian associated to the safe-learning problem can be easily computed using standard policy gradient results and stochastic approximation tools. Leveraging these advantages, we establish that primal-dual algorithms are able to find policies that are safe and optimal. We test the proposed approach in a navigation task in a continuous domain. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.

show abstract

Sparsity-promoting sensor selection with energy harvesting constraints

Calvo-Fullana¹,

Matamoros²,

Antón-Haro³

et al. 2016

View full text Add to dashboard Cite

In this paper, we propose a novel sensor selection scheme for networks equipped with energy harvesting sensing devices. Ultimately, the goal is to minimize the reconstruction distortion at the fusion center by selecting a reduced (i.e., sparse) yet informative enough subset of sensors. The solution must also fulfill the causality constraints associated to the energy harvesting process. For a classical formulation, the optimization problem turns out to be nonconvex. To circumvent that, we promote sparsity directly in the power allocation vector by introducing a log-sum penalty term in the cost function. The problem can be iteratively solved by resorting to majorization-minimization procedure leading to a stationary point of the solution. Numerical results reveal that, by using a logsum penalty term, the sensor selection scheme outperforms others based on the 1 norm while making an effective use of the harvested energy.

show abstract

Learning Safe Policies via Primal-Dual Methods

Paternain

Calvo-Fullana

Chamon

et al. 2019

View full text Add to dashboard Cite

Constrained Reinforcement Learning Has Zero Duality Gap

Paternain¹,

Chamon²,

Calvo-Fullana³

et al. 2019

Preprint

View full text Add to dashboard Cite

ROS-NetSim: A Framework for the Integration of Robotic and Network Simulators

Calvo-Fullana

Mox

Pyattaev

et al. 2021

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Sensor Selection and Power Allocation Strategies for Energy Harvesting Wireless Sensor Networks

Calvo-Fullana

Matamoros

Antón-Haro

2016

IEEE J. Select. Areas Commun.

View full text Add to dashboard Cite

Abstract-In this paper, we investigate the problem of jointly selecting a predefined number of energy-harvesting (EH) sensors and computing the optimal power allocation. The ultimate goal is to minimize the reconstruction distortion at the fusion center. This optimization problem is, unfortunately, non-convex. To circumvent that, we propose two suboptimal strategies: (i) a joint sensor selection and power allocation (JSS-EH) scheme that, we prove, is capable of iteratively finding a stationary solution of the original problem from a sequence of surrogate convex problems; and (ii) a separate sensor selection and power allocation (SS-EH) scheme, on which basis we can identify a sensible sensor selection and analytically find a power allocation policy by solving a convex problem. We also discuss the interplay between the two strategies. Performance in terms of reconstruction distortion, impact of initialization, actual subsets of selected sensors and computed power allocation policies, etc., is assessed by means of computer simulations. To that aim, an EH-agnostic sensor selection strategy, a lower bound on distortion, and an online version of the SS-EH and JSS-EH schemes are derived and used for benchmarking.

show abstract

Safe Policies for Reinforcement Learning via Primal-Dual Methods

Paternain¹,

Calvo-Fullana²,

Chamon³

et al. 2023

IEEE Trans. Automat. Contr.

View full text Add to dashboard Cite

Reconstruction of Correlated Sources With Energy Harvesting Constraints in Delay-Constrained and Delay-Tolerant Communication Scenarios

Calvo-Fullana

Matamoros

Antón-Haro

2017

IEEE Trans. Wireless Commun.

View full text Add to dashboard Cite

In this paper, we investigate the reconstruction of time-correlated sources in a point-to-point communications scenario comprising an energy-harvesting sensor and a Fusion Center (FC). Our goal is to minimize the average distortion in the reconstructed observations by using data from previously encoded sources as side information. First, we analyze a delayconstrained scenario, where the sources must be reconstructed before the next time slot. We formulate the problem in a convex optimization framework and derive the optimal transmission (i.e., power and rate allocation) policy. To solve this problem, we propose an iterative algorithm based on the subgradient method. Interestingly, the solution to the problem consists of a coupling between a two-dimensional directional water-filling algorithm (for power allocation) and a reverse water-filling algorithm (for rate allocation). Then we find a more general solution to this problem in a delay-tolerant scenario where the time horizon for source reconstruction is extended to multiple time slots. Finally, we provide some numerical results that illustrate the impact of delay and correlation in the power and rate allocation policies, and in the resulting reconstruction distortion. We also discuss the performance gap exhibited by a heuristic online policy derived from the optimal (offline) one.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.