c-FLIP Expression in Foxp3-Expressing Cells Is Essential for Survival of Regulatory T Cells and Prevention of Autoimmunity

Enabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. In contrast, learning-based methods improve as the robot acts in the environment, but are difficult to deploy in the real-world due to their high sample complexity. To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based. We then instantiate this graph to form a navigation model that learns from raw images and is sample efficient. Our simulated car experiments explore the design decisions of our navigation model, and show our approach outperforms single-step and N -step double Q-learning. We also evaluate our approach on a real-world RC car and show it can learn to navigate through a complex indoor environment with a few hours of fully autonomous, self-supervised training. Videos of the experiments and code can be found at github.com/gkahn13/gcg

show abstract

Learning Highway Ramp Merging Via Reinforcement Learning with Temporally-Extended Actions

Triest

Villaflor

Dolan

2020

View full text Add to dashboard Cite

Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Kahn¹,

Villaflor²,

Ding³

et al. 2017

Preprint

View full text Add to dashboard Cite

BATS: Best Action Trajectory Stitching

Char¹,

Mehta²,

Villaflor³

et al. 2022

Preprint

View full text Add to dashboard Cite

The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for developing algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by planning on the fixed dataset directly. Specifically, we introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We do this by using learned dynamics models to plan short trajectories between states. Since exact value iteration can be performed on this constructed MDP, it becomes easy to identify which trajectories are advantageous to add to the MDP. Crucially, since most transitions in this MDP come from the logged data, trajectories from the MDP can be rolled out for long periods with confidence. We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics. Finally, we demonstrate empirically how algorithms that uniformly constrain the learned policy to the entire dataset can result in unwanted behavior, and we show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem.

show abstract

Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios

Killing

Villaflor²,

Dolan³

2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adam Villaflor

Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Learning Highway Ramp Merging Via Reinforcement Learning with Temporally-Extended Actions

Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

BATS: Best Action Trajectory Stitching

Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios

Contact Info

Product

Resources

About