This paper presents a method for constructing human-robot interaction policies in settings where multimodality, i.e., the possibility of multiple highly distinct futures, plays a critical role in decision making. We are motivated in this work by the example of traffic weaving, e.g., at highway onramps/off-ramps, where entering and exiting cars must swap lanes in a short distance-a challenging negotiation even for experienced drivers due to the inherent multimodal uncertainty of who will pass whom. Our approach is to learn multimodal probability distributions over future human actions from a dataset of human-human exemplars and perform real-time robot policy construction in the resulting environment model through massively parallel sampling of human responses to candidate robot action sequences. Direct learning of these distributions is made possible by recent advances in the theory of conditional variational autoencoders (CVAEs), whereby we learn action distributions simultaneously conditioned on the present interaction history, as well as candidate future robot actions in order to take into account response dynamics. We demonstrate the efficacy of this approach with a human-in-theloop simulation of a traffic weaving scenario. , x (t+i+1) ) = J c +J a +J l +J d corresponding to collision avoidance, control effort, lane change incentive, and longitudinal disambiguation incentive defined as: J c = 1000 · 1 {|∆s|<8∧|∆τ |<2} · (9.25 − ∆s 2 + ∆τ 2 ) J a =s 2 r J l = −500 · min(1.5 + s r /150, 1) · |τ r − τ goal | J d = −100 · min(max(∆s∆ṡ, 0), 1)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.