Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning

Luck, Kevin Sebastian; Amor, Heni Ben; Calandra, Roberto

doi:10.48550/arxiv.1911.06832

Cited by 1 publication

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Related Workmentioning

confidence: 99%

“…Similar to our approach, several reinforcement learning-based strategies to co-optimization exist for both continuous [1,32,33,34] and discrete design spaces [10,35,14]. Luck et al [33] use a soft actor-critic algorithm and use a design-conditioned Q-function to evaluate designs. Chen et al [34] model the design space as a differentiable computational graph, which allows them to use standard gradient-based methods.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

N-LIMB: Neural Limb Optimization for Efficient Morphological Design

Schaff¹,

Walter²

2022

Preprint

View full text Add to dashboard Cite

A robot's ability to complete a task is heavily dependent on its physical design. However, identifying an optimal physical design and its corresponding control policy is inherently challenging. The freedom to choose the number of links, their type, and how they are connected results in a combinatorial design space, and the evaluation of any design in that space requires deriving its optimal controller. In this work, we present N-LIMB, an efficient approach to optimizing the design and control of a robot over large sets of morphologies. Central to our framework is a universal, design-conditioned control policy capable of controlling a diverse sets of designs. This policy greatly improves the sample efficiency of our approach by allowing the transfer of experience across designs and reducing the cost to evaluate new designs. We train this policy to maximize expected return over a distribution of designs, which is simultaneously updated towards higher performing designs under the universal policy. In this way, our approach converges towards a design distribution peaked around high-performing designs and a controller that is effectively fine-tuned for those designs. We demonstrate the potential of our approach on a series of locomotion tasks across varying terrains and show the discovery novel and high-performing design-control pairs.

show abstract