Hierarchical Program-Triggered Reinforcement Learning Agents for Automated Driving

Gangopadhyay, Briti; Soora, Harshit; Dasgupta, Pallab

doi:10.1109/tits.2021.3096998

Cited by 21 publications

(2 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is a rich literature of work which studies interactive agents, and grounding their behaviors in language [9,10,11,12]. Many prior works have studied this problem in the context of instruction following, where an agent aims to complete a task specified by formal language/programs [13,14,15,16,17,18] or natural language [10,11,19,20]. While these approaches have been largely studied in simulated spatial games [19,21,22,23] or in object-directed visual navigation in simulated robots [24,25,26,27,28,29,23] some of which include high-level object interaction [30], in this work we focus on the domain of learning control for vision-based robotic manipulation.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

Nair¹,

Mitchell²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot's observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robot's observation space. To scalably learn this grounding we propose to leverage offline robot datasets (including highly sub-optimal, autonomously collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goalimage specifications and language conditioned imitation techniques by more than 25%, and is able to perform visuomotor tasks from natural language, such as "open the right drawer" and "move the stapler", on a Franka Emika Panda robot.

show abstract

Section: Related Workmentioning

confidence: 99%

“…We include qualitative examples of the ranked predicted trajectories under different language instructions on the real robot in Figures 14,15,16,17,and 18. close drawer open drawer turn faucet left turn faucet right move black mug right move white mug down average…”

Section: D4 Qualitative Examplesmentioning

confidence: 99%