Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories. To address those challenges, we propose a multi-level relation detection strategy that utilizes human pose cues to capture global spatial configurations of relations and as an attention mechanism to dynamically zoom into relevant regions at human part level. Specifically, we develop a multi-branch deep network to learn a pose-augmented relation representation at three semantic levels, incorporating interaction context, object features and detailed semantic part cues. As a result, our approach is capable of generating robust predictions on fine-grained human object interactions with interpretable outputs. Extensive experimental evaluations on public benchmarks show that our model outperforms prior methods by a considerable margin, demonstrating its efficacy in handling complex scenes. Code is available at https://github.com/bobwan1995/PMFNet.
Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at https://github.com/youngfly11/LCMCG-PyTorch.
Accessibility is a major method for evaluating the distribution of service facilities and identifying areas in shortage of service. Traditional accessibility methods, however, are largely model-based and do not consider the actual utilization of services, which may lead to results that are different from those obtained when people’s actual behaviors are taken into account. Based on taxi GPS trajectory data, this paper proposed a novel integrated catchment area (ICA) that integrates actual human travel behavior to evaluate the accessibility to healthcare facilities in Shenzhen, China, using the enhanced two-step floating catchment area (E2SFCA) method. This method is called the E2SFCA-ICA method. First, access probability is proposed to depict the probability of visiting a healthcare facility. Then, integrated access probability (IAP), which integrates model-based access probability (MAP) and data-based access probability (DAP), is presented. Under the constraint of IAP, ICA is generated and divided into distinct subzones. Finally, the ICA and subzones are incorporated into the E2SFCA method to evaluate the accessibility of the top-tier hospitals in Shenzhen, China. The results show that the ICA not only reduces the differences between model-based catchment areas and data-based catchment areas, but also distinguishes the core catchment area, stable catchment area, uncertain catchment area and remote catchment area of healthcare facilities. The study also found that the accessibility of Shenzhen’s top-tier hospitals obtained with traditional catchment areas tends to be overestimated and more unequally distributed in space when compared to the accessibility obtained with integrated catchment areas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.