Probing Spatial Clues: Canonical Spatial Templates for Object Relationship Understanding

Collell, Guillem; Deruyttere, Thierry; Moens, Marie‐Francine

doi:10.1109/access.2021.3113781

Cited by 4 publications

(3 citation statements)

References 95 publications

(103 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With regards to grounding text to physical locations, Lourentzou, Morales, and Zhai (2017) focus on predicting physical geographic origins of Twitter posts, while Grujicic et al (2020) localize medical text referring to anatomical concepts to their corresponding physical locations in the human body. Finally, (Collell, Deruyttere, and Moens 2021) have also worked on understanding the implicit spatial relationships of queries and objects.…”

Section: Query Understandingmentioning

confidence: 99%

Predicting Physical World Destinations for Commands Given to Self-Driving Cars

Grujicic

Deruyttere

Moens

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

In recent years, we have seen significant steps taken in the development of self-driving cars. Multiple companies are starting to roll out impressive systems that work in a variety of settings. These systems can sometimes give the impression that full self-driving is just around the corner and that we would soon build cars without even a steering wheel. The increase in the level of autonomy and control given to an AI provides an opportunity for new modes of human-vehicle interaction. However, surveys have shown that giving more control to an AI in self-driving cars is accompanied by a degree of uneasiness by passengers. In an attempt to alleviate this issue, recent works have taken a natural language-oriented approach by allowing the passenger to give commands that refer to specific objects in the visual scene. Nevertheless, this is only half the task as the car should also understand the physical destination of the command, which is what we focus on in this paper. We propose an extension in which we annotate the 3D destination that the car needs to reach after executing the given command and evaluate multiple different baselines on predicting this destination location. Additionally, we introduce a model that outperforms the prior works adapted for this particular setting.

show abstract

Section: Query Understandingmentioning

confidence: 99%

Predicting Physical World Destinations for Commands Given to Self-Driving Cars

Grujicic

Deruyttere

Moens

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…Course layout and fine-grained layout spatial features were exploited in [23] to predict HOI, and the authors argued that the appearance features of objects did not affect the HOI prediction performance. Spatial clues for scene understanding are investigated in [50] and propose canonical spatial representation templates that indicate the power of spatial features in visual relationship applications and outperform many HOI state-of-the-art models.…”

Section: E Spatial Features In Hoimentioning

confidence: 99%

Spatial-Net for Human-Object Interaction Detection

et al. 2022

View full text Add to dashboard Cite

Human-object interaction (HOI) detection is the detection of a human's relationship with an object in still images and videos. The majority of HOI detection methods rely on appearance features as the primary feature for detecting the relationship between humans and objects. Furthermore, the model's performance is affected by the abundance of false-positive pairs generated by the image's non-interactive human-object pairs and human-object mis-grouping. In this paper, we propose "Spatial-Net", a new HOI detection approach in still images. In the proposed approach, the HOI problem is divided into two main tasks, namely pair-prediction and global-rejection. In the pair-prediction task, the spatial relationship is adopted to predict the human-object interaction for each human-object pair using spatial features that contains spatial map which is a single channel image that represents human-object pairs including body parts and object masks, relative geometry features such as relative size, relative distance, and intersection-overunion between body part and objects, and weighted distance that is used as body part attention deterministic model. In the global-rejection task, an augmented model is employed to reject false positive pairs. We use the Hungarian matching technique to assign human-object pairs for each action and human-centric model to reject the non-interaction human-object pairs according to semantic co-occurrence between human and object. The experimental results on the V-COCO dataset demonstrate that the proposed Spatial-Net outperforms many state-of-the-art HOI models with less inference time.

show abstract

“…Unlike the works that ground the referring expressions in a visual scene, the works of [40], [41] localize the referring expression in the physical geographic regions or the anatomical model of the human body. The work of [42] focuses on capturing implicit spatial relationships between different kinds of objects that appear in the natural language commands. On the other hand, the work of [43] focuses on the task laid out in Touchdown [11], which involves following passenger instructions.…”

Section: B Natural Language Commandsmentioning

confidence: 99%

Talk2Car: Predicting Physical Trajectories for Natural Language Commands

et al. 2022

Self Cite

View full text Add to dashboard Cite

In recent years, there has been an increased interest in giving verbal commands to selfdriving cars. Even though multiple companies have showcased progress towards fully autonomous vehicles, surveys have indicated that people are wary of relinquishing total control of the vehicle to the AI. Thus, a system allowing passengers to control the vehicle's actions would be preferable. Natural language, the most widespread form of communication among humans, presents itself as the most natural control interface, and survey results confirm that the ability to give verbal commands to self-driving vehicles would make the passengers more at ease. In this work, we propose a novel system that predicts which object is referred to by the issued command and the path the car should follow through the immediate surroundings to execute the command. We experiment with different approaches and features to predict the object of interest and show that our simple but effective approach achieves state-of-the-art performance. For predicting the trajectory, we propose a model that relies on a mixture density approach for modeling the distributions of key waypoints of the trajectory in the top-down scene layout. Additionally, we investigate the influence of the two tasks on each other and show that improvements in the prediction of the object of interest lead to improvements in the trajectory prediction task. Finally, we provide the research community with an extension to the Talk2Car dataset, with new trajectory annotations for given commands.

show abstract

Probing Spatial Clues: Canonical Spatial Templates for Object Relationship Understanding

Cited by 4 publications

References 95 publications

Predicting Physical World Destinations for Commands Given to Self-Driving Cars

Predicting Physical World Destinations for Commands Given to Self-Driving Cars

Spatial-Net for Human-Object Interaction Detection

Talk2Car: Predicting Physical Trajectories for Natural Language Commands

Contact Info

Product

Resources

About