3-D Relation Network for visual relation recognition in videos

Cao, Qianwen; Huang, Heyan; Shang, Xindi; Wang, Boran; Chua, Tat-Seng

doi:10.1016/j.neucom.2020.12.029

Cited by 19 publications

(5 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• VRD-STGC [27], which proposes a novel slidingwindow scheme to simultaneously predict short-term and long-term relationships [27], and extracts spatiotemporal features. • 3DRN [65], which develops a 3-D CNN to learn the visual features for relation recognition in an end-to-end manner.…”

Section: Multi-expert Performancementioning

confidence: 99%

See 1 more Smart Citation

Video Relationship Detection Using Mixture of Experts

2023

View full text Add to dashboard Cite

Machine comprehension of visual information from images and videos by neural networks suffers from two limitations: (1) the computational and inference gap in vision and language to accurately determine which object a given agent acts on and then to represent it by language, and (2) the shortcoming in stability and generalization of the classifier trained by a single, monolithic neural network. To address these limitations, we propose MoE-VRD, a novel approach to visual relationship detection via a mixture of experts. MoE-VRD recognizes language triplets in the form of a < subject, predicate, object > tuple to extract the relationship between subject, predicate, and object from visual processing. Since detecting a relationship between a subject (acting) and the object(s) (being acted upon) requires that the action be recognized, we base our network on recent work in visual relationship detection. To address the limitations associated with single monolithic networks, our mixture of experts is based on multiple small models, whose outputs are aggregated. That is, each expert in MoE-VRD is a visual relationship learner capable of detecting and tagging objects. MoE-VRD employs an ensemble of networks while preserving the complexity and computational cost of the original underlying visual relationship model by applying a sparsely-gated mixture of experts, which allows for conditional computation and a significant gain in neural network capacity. We show that the conditional computation capabilities and massive ability to scale the mixture-of-experts leads to an approach to the visual relationship detection problem which outperforms the state-of-the-art.

show abstract

Section: Multi-expert Performancementioning

confidence: 99%

“…Relation detection Relation tagging mAP R@50 R@100 P@1 P@5 P@10 3DRN [65] 2.47 2, but here on the VidOR dataset [62].…”

Section: Vidor Datasetmentioning

confidence: 99%

Video Relationship Detection Using Mixture of Experts

2023

View full text Add to dashboard Cite

show abstract

“…The metric-based meta-learning method is a non-parametric learning model, so its complexity is less than other methods. The idea is to learn the meta-knowledge of how to measure the similarity of samples between the support set and the query set from the embedding space by using feature embedding, such as matching network [23], relation network [24]. Generally, deep neural networks are used to map samples into the feature space, and cosine similarity [25] is used to measure the similarity of features, predict the category labels, calculate the loss and then back propagate to optimize the network.…”

Section: Preliminary Knowledgementioning

confidence: 99%

Few shot cross equipment fault diagnosis method based on parameter optimization and feature mertic

Tao

Chen

Qiu

et al. 2022

Meas. Sci. Technol.

128

View full text Add to dashboard Cite

With the rapid development of industrial informatization and deep learning technology, modern data-driven fault diagnosis (MIFD) methods based on deep learning have been continuously emphasized by the industry. However, most of these methods require sufficient training samples to achieve the desired diagnostic effect, but the scarcity of fault samples in the actual industrial environment leads to the limited development of MIFD methods. In addition, due to the changes of equipment operating conditions and production requirements, data-driven fault diagnosis methods often need to face the cross domain problem of cross load or even cross different equipment. In this paper, a parameter optimization and feature metric-based fault diagnosis method with few samples, called model agnostic matching network model, is designed for the problem of sparse fault samples and cross-domain between data sets in real industrial environments. The method combines both a parameter-based optimization meta-learning network, which extracts optimization information adapted to different domains, and a metric-based meta-learning network, which extracts metric information for similarity discriminations. The experimental result show that the method outperforms the current baseline method for the 5-shot fault diagnosis problem of rolling bearings under limited data conditions and achieves an accuracy of up to 94.4% in cross-equipment diagnosis experiments from rolling bearings to gas regulators, indicating the feasibility of the method. The features are visualized by T-SNE to show the validity of the model.

show abstract

“…Snippet relation detection. Many before us have investigated relation detection in videos [5,11,25,30,34,38,39,42,43,44,46,53,59]. Relation in videos provide additional temporal information, important for interactions such as pushing or pulling a closed door.…”

Section: Related Workmentioning

confidence: 99%

Social Fabric: Tubelet Compositions for Video Relation Detection

Chen¹,

Shi²,

Mettes³

et al. 2021

Preprint

View full text Add to dashboard Cite

a person swordfighting with another person "approach" "clash" "fall" an adult chasing a child "run" "greet"Figure 1: Social Fabric encodes compositions of interaction primitives defined over tubelet pairs. The primitives are data driven and may correspond to interactions like "greet", "clash" and "fall". Using the primitives, our two-stage network can classify, detect, and search for complex relations across entire videos.

show abstract

3-D Relation Network for visual relation recognition in videos

Cited by 19 publications

References 12 publications

Video Relationship Detection Using Mixture of Experts

Video Relationship Detection Using Mixture of Experts

Few shot cross equipment fault diagnosis method based on parameter optimization and feature mertic

Social Fabric: Tubelet Compositions for Video Relation Detection

Contact Info

Product

Resources

About