“…The underlying idea is to learn a predictive model from observed expert data, e.g., with deep learning. The literature has made significant progress in the following key points: (i) The representation of social interaction as a feature with a Pooling Module (PM) between a variable number of agents [8], [9], [11], [15], [20], [24]. (ii) Capturing the temporal relationships of the sequence, where GRU [1], LSTM [4], and Transformer [28] are among the most popular recurrent neural networks (RNN) [13], [29], [30].…”
Section: Related Workmentioning
confidence: 99%
“…(iii) Uncertainty in predicting human movement is modeled with generative frameworks. In [3], [8], [15] the (conditional) GAN (CGAN) [2] and in [13], [24], [26] the (conditional) VAE (CVAE) [6] are used as generative frameworks. The CNF is used for trajectory prediction by [12], [21], [22].…”
Section: Related Workmentioning
confidence: 99%
“…We model the social interaction as well as [21], [22] at each time step for all agents simultaneously. Furthermore, we avoid possible collisions with other agents during prediction using our SocialSampling and the collision loss function of [15]. This property of human trajectories is frequently neglected and only a few works such as [15] encourage collision-free prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Note that with the use of SocialSampling (3) another unknown variable εS is introduced which makes it ambiguous to invert the SIM. SIM Implementation: We implement the NN m a , ς a and ε a by adapting the encoder-decoder RNN architecture of [15], which also captures the interaction between agents. All NNs are instantiated per agent, sharing weights θ m , θ ς and θ ε .…”
Section: A Social Interaction Modelmentioning
confidence: 99%
“…This hidden state initializes RNN dec for the predictive rollouts of the conditional (1) and subsequent SocialSampling (3) to generate S a t and εS a t of its own agent 2 . δ t and S t−1 of all agents are aggregated into a personal social context Γ a t using the Pooling Module (PM) of [15]. Similarly, for SocialSampling, the personal social context εΓ a t is computed with the input δ t and the current state S t , by an another PM labeled as εPM.…”
Dense crowds are challenging scenes for an autonomous mobile robot. Planning in such an interactive environment requires predicting uncertain human intention and reaction to future robot actions. Concerning these capabilities, we propose a probabilistic forecasting model which factorizes the human motion uncertainty as follows: 1) A (conditioned) normalizing flow (CNF) estimates the densities of human goals. 2) We forecast the crowd’s trajectory density towards the goals by autoregressively (AR) inferring the individual social action densities simultaneously for a dynamic number of persons. We extend the underlying Gaussian AR framework with our SocialSampling to counteract collisions during sampling. Based on this, a model-predictive policy that infers about the social influence of a controlled robot on neighboring humans is created. For this, we formulate a novel Social Influence (SI) objective that measures the divergence between a scene prediction conditioned by the robot plan and a scene prediction independent of the robot plan. The experiments with our metrics on real datasets show that the model achieves stateof-the-art accuracy in predicting pedestrian movements and remains collision-free, which is still often neglected in current methods. Furthermore, our evaluations show that robot policy with our SI objective produces safe and proactive behaviors, such as taking evasive action at the right time to avoid conflicts.
“…The underlying idea is to learn a predictive model from observed expert data, e.g., with deep learning. The literature has made significant progress in the following key points: (i) The representation of social interaction as a feature with a Pooling Module (PM) between a variable number of agents [8], [9], [11], [15], [20], [24]. (ii) Capturing the temporal relationships of the sequence, where GRU [1], LSTM [4], and Transformer [28] are among the most popular recurrent neural networks (RNN) [13], [29], [30].…”
Section: Related Workmentioning
confidence: 99%
“…(iii) Uncertainty in predicting human movement is modeled with generative frameworks. In [3], [8], [15] the (conditional) GAN (CGAN) [2] and in [13], [24], [26] the (conditional) VAE (CVAE) [6] are used as generative frameworks. The CNF is used for trajectory prediction by [12], [21], [22].…”
Section: Related Workmentioning
confidence: 99%
“…We model the social interaction as well as [21], [22] at each time step for all agents simultaneously. Furthermore, we avoid possible collisions with other agents during prediction using our SocialSampling and the collision loss function of [15]. This property of human trajectories is frequently neglected and only a few works such as [15] encourage collision-free prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Note that with the use of SocialSampling (3) another unknown variable εS is introduced which makes it ambiguous to invert the SIM. SIM Implementation: We implement the NN m a , ς a and ε a by adapting the encoder-decoder RNN architecture of [15], which also captures the interaction between agents. All NNs are instantiated per agent, sharing weights θ m , θ ς and θ ε .…”
Section: A Social Interaction Modelmentioning
confidence: 99%
“…This hidden state initializes RNN dec for the predictive rollouts of the conditional (1) and subsequent SocialSampling (3) to generate S a t and εS a t of its own agent 2 . δ t and S t−1 of all agents are aggregated into a personal social context Γ a t using the Pooling Module (PM) of [15]. Similarly, for SocialSampling, the personal social context εΓ a t is computed with the input δ t and the current state S t , by an another PM labeled as εPM.…”
Dense crowds are challenging scenes for an autonomous mobile robot. Planning in such an interactive environment requires predicting uncertain human intention and reaction to future robot actions. Concerning these capabilities, we propose a probabilistic forecasting model which factorizes the human motion uncertainty as follows: 1) A (conditioned) normalizing flow (CNF) estimates the densities of human goals. 2) We forecast the crowd’s trajectory density towards the goals by autoregressively (AR) inferring the individual social action densities simultaneously for a dynamic number of persons. We extend the underlying Gaussian AR framework with our SocialSampling to counteract collisions during sampling. Based on this, a model-predictive policy that infers about the social influence of a controlled robot on neighboring humans is created. For this, we formulate a novel Social Influence (SI) objective that measures the divergence between a scene prediction conditioned by the robot plan and a scene prediction independent of the robot plan. The experiments with our metrics on real datasets show that the model achieves stateof-the-art accuracy in predicting pedestrian movements and remains collision-free, which is still often neglected in current methods. Furthermore, our evaluations show that robot policy with our SI objective produces safe and proactive behaviors, such as taking evasive action at the right time to avoid conflicts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.