“…Motion prediction is a regression task, where a series of coordinates that correspond to an agent's future pose or location are predicted using their past pose or location, sometimes in combination with other features like ego video (e.g., Adeli et al, 2021), maps (e.g., Salzmann et al, 2021), head orientation (e.g., Haddad & Lam, 2021), body positioning (e.g., , GPS location, (e.g., Sadeghian et al, 2018b), and/or extracted visual features from cropped images (e.g., Haddad & Lam, 2021) of the agents in the scene. Multimodality in motion prediction is a large area of interest in both trajectory (e.g., Dong et al, 2021;Kosaraju et al, 2019;Gu et al, 2022) and pose (e.g., Fragkiadaki et al, 2017;Gu et al, 2021;Yan et al, 2018) 2020) and Korbmacher and Tordeux (2021).…”