Kunal Menda scite author profile

While imitation learning is often used in robotics, the approach frequently suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by aggregating training data from both the expert and novice policies, but does not consider the impact of safety. We present a probabilistic extension to DAgger, which attempts to quantify the confidence of the novice policy as a proxy for safety. Our method, EnsembleDAgger, approximates a Gaussian Process using an ensemble of neural networks. Using the variance as a measure of confidence, we compute a decision rule that captures how much we doubt the novice, thus determining when it is safe to allow the novice to act. With this approach, we aim to maximize the novice's share of actions, while constraining the probability of failure. We demonstrate improved safety and learning performance compared to other DAgger variants and classic imitation learning on an inverted pendulum and in the MuJoCo HalfCheetah environment.

show abstract

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

Menda

Chen

Grana

et al. 2019

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies Generalized Advantage Estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios where the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely-separated events occur, adversely affecting the policies learned. Additionally, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.

show abstract

A General Framework for Structured Learning of Mechanical Systems

Gupta¹,

Menda²,

Manchester³

et al. 2019

Preprint

View full text Add to dashboard Cite

Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is available, and train expressive function approximators where it is not. We propose to parameterize a mechanical system using neural networks to model its Lagrangian and the generalized forces that act on it. We test our method on a simulated, actuated double pendulum. We show that our method outperforms a naive, black-box model in terms of data-efficiency, as well as performance in model-based reinforcement learning. We also conduct a systematic study of our method's ability to incorporate available prior knowledge about the system to improve data efficiency.

show abstract

Explaining COVID-19 outbreaks with reactive SEIRD models

Menda

Laird

Kochenderfer

et al. 2021

Sci Rep

View full text Add to dashboard Cite

COVID-19 epidemics have varied dramatically in nature across the United States, where some counties have clear peaks in infections, and others have had a multitude of unpredictable and non-distinct peaks. Our lack of understanding of how the pandemic has evolved leads to increasing errors in our ability to predict the spread of the disease. This work seeks to explain this diversity in epidemic progressions by considering an extension to the compartmental SEIRD model. The model we propose uses a neural network to predict the infection rate as a function of both time and the disease’s prevalence. We provide a methodology for fitting this model to available county-level data describing aggregate cases and deaths. Our method uses Expectation-Maximization to overcome the challenge of partial observability, due to the fact that the system’s state is only partially reflected in available data. We fit a single model to data from multiple counties in the United States exhibiting different behavior. By simulating the model, we show that it can exhibit both single peak and multi-peak behavior, reproducing behavior observed in counties both in and out of the training set. We then compare the error of simulations from our model with a standard SEIRD model, and show that ours substantially reduces errors. We also use simulated data to compare our methodology for handling partial observability with a standard approach, showing that ours is significantly better at estimating the values of unobserved quantities.

show abstract

Conditional Approximate Normalizing Flows for Joint Multi-Step Probabilistic Forecasting with Application to Electricity Demand

Jamgochian¹,

Wu²,

Menda³

et al. 2022

Preprint

View full text Add to dashboard Cite

Some real-world decision-making problems require making probabilistic forecasts over multiple steps at once. However, methods for probabilistic forecasting may fail to capture correlations in the underlying time-series that exist over long time horizons as errors accumulate. One such application is with resource scheduling under uncertainty in a grid environment, which requires forecasting electricity demand that is inherently noisy, but often cyclic. In this paper, we introduce the conditional approximate normalizing flow (CANF) to make probabilistic multi-step time-series forecasts when correlations are present over long time horizons. We first demonstrate our method's efficacy on estimating the density of a toy distribution, finding that CANF improves the KL divergence by one-third compared to that of a Gaussian mixture model while still being amenable to explicit conditioning. We then use a publicly available household electricity consumption dataset to showcase the effectiveness of CANF on joint probabilistic multi-step forecasting. Empirical results show that conditional approximate normalizing flows outperform other methods in terms of multi-step forecast accuracy and lead to up to 10x better scheduling decisions. Our implementation is available at https://github.com/sisl/JointDemandForecasting.

show abstract

EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning

Menda¹,

Driggs-Campbell²,

Kochenderfer³

2018

Preprint

View full text Add to dashboard Cite

Multi-Vehicle Control in Roundabouts using Decentralized Game-Theoretic Planning

Jamgochian¹,

Menda²,

Kochenderfer³

2022

Preprint

View full text Add to dashboard Cite

Safe navigation in dense, urban driving environments remains an open problem and an active area of research. Unlike typical predict-thenplan approaches, game-theoretic planning considers how one vehicle's plan will affect the actions of another. Recent work has demonstrated significant improvements in the time required to find local Nash equilibria in general-sum games with nonlinear objectives and constraints. When applied trivially to driving, these works assume all vehicles in a scene play a game together, which can result in intractable computation times for dense traffic. We formulate a decentralized approach to game-theoretic planning by assuming that agents only play games within their observational vicinity, which we believe to be a more reasonable assumption for human driving. Games are played in parallel for all strongly connected components of an interaction graph, significantly reducing the number of players and constraints in each game, and therefore the time required for planning. We demonstrate that our approach can achieve collision-free, efficient driving in urban environments by comparing performance against an adaptation of the Intelligent Driver Model and centralized game-theoretic planning when navigating roundabouts in the INTER-ACTION dataset. Our implementation is available at http://github.com/sisl/DecNashPlanning.

show abstract

Scalable Identification of Partially Observed Systems with Certainty-Equivalent EM

Menda¹,

Becdelièvre²,

Gupta³

et al. 2020

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kunal Menda

EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

A General Framework for Structured Learning of Mechanical Systems

Explaining COVID-19 outbreaks with reactive SEIRD models

Conditional Approximate Normalizing Flows for Joint Multi-Step Probabilistic Forecasting with Application to Electricity Demand

EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning

Multi-Vehicle Control in Roundabouts using Decentralized Game-Theoretic Planning

Scalable Identification of Partially Observed Systems with Certainty-Equivalent EM

Contact Info

Product

Resources

About