Answering why-not questions on semantic multimedia queries

et al. 2020

AAAI

Self Cite

The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines.

Section: Introductionmentioning

confidence: 99%

One-Shot Learning for Long-Tail Visual Relation Detection

et al. 2020

AAAI

Self Cite

“…where i and j represent the positions of the corresponding pixels in the image; (4) According to the linear equation of the acceleration vector, the intersection point P = {p 1 , p 2 , … p s } is calculated. The main calculation is the intersection of all the two straight lines, and the mathematical derivation is as follows [18] :…”

Section: Anomaly Localization Algorithm Based On Single Escape Centermentioning

confidence: 99%

Big data analytics of crime prevention and control based on image processing upon cloud computing

Xu¹,

Cheng²,

Sugumaran³

2020

JSSS

Aim: Current crime behavior observation has the problem of not being real time, thus criminal behavior cannot be promptly controlled. To improve the control of criminal behavior, this study was based on cloud computing image processing, and adopted data mining for criminal behavior. Methods: This study obtained many criminal behavior characteristics through data collection and combined the rapid response capability of cloud computing to adopt data processing. In addition, to improve the accuracy of criminal behavior recognition, the identification method for criminal behaviors in selected populations was studied, and the image processing technology was combined to identify individual crimes and subject segmentation. Results: Our work used statistical methods to collect the characteristics of criminal behavior, and we designed experiments to verify the effectiveness of the algorithm. The experimental research shows that the algorithm has high accuracy in identifying abnormal behavior. Conclusion: The research shows that the accuracy of the algorithm for identifying abnormal behavior is relatively high, and it has high practical value, which can meet the accuracy and real-time requirements of security systems.

“…Equivalently, a scene graph can also be represented as a set of triplets <subject-relation-object>. Scene graph generation is useful for a wide range of image understanding tasks, such as captioning [6,[9][10][11]41], retrieval [8,16], reasoning [26,31], multimodal knowledge graph [32], and visual question answering [33,37,38,50]. It bridges the gap between visual perception and high-level cognition.…”

Section: Introductionmentioning

confidence: 99%

Memory-Based Network for Scene Graph with Unbalanced Relations

Liu

Proceedings of the 28th ACM International Conference on Multimedia

et al. 2020

Self Cite

The scene graph which can be represented by a set of visual triples is composed of objects and the relations between object pairs. It is vital for image captioning, visual question answering, and many other applications. However, there is a long tail distribution on the scene graph dataset, and the tail relation cannot be accurately identified due to the lack of training samples. The problem of the nonstandard label and feature overlap on the scene graph affects the extraction of discriminative features and exacerbates the effect of data imbalance on the model. For these reasons, we propose a novel scene graph generation model that can effectively improve the detection of low-frequency relations. We use the method of memory features to realize the transfer of high-frequency relation features to low-frequency relation features. Extensive experiments on scene graph datasets show that our model significantly improved the performance of two evaluation metrics R@K and mR@K compared with state-of-the-art baselines. CCS CONCEPTS • Computing methodologies → Scene understanding; Image representations.