Dual-Stream Recurrent Neural Network for Video Captioning

Xu, Ning; Liu, An-An; Wong, Yongkang; Zhang, Yongdong; Nie, Weizhi; Su, Yuting; Kankanhalli, Mohan

doi:10.1109/tcsvt.2018.2867286

Cited by 91 publications

(27 citation statements)

References 83 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We validated the effectiveness of CMAT through extensive comparative and ablative experiments. Moving forward, we are going to 1) design a more effective graph-level metric to guide the CMAT training and 2) apply CMAT in downstream tasks such as VQA [61,70], dialog [41], and captioning [64].…”

Section: Discussionmentioning

confidence: 99%

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Chen

Zhang

Xiao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

143

105

View full text Add to dashboard Cite

Scene graphs -objects as nodes and visual relationships as edges -describe the whereabouts and interactions of objects in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects. For example, "person" on "bike" can help to determine the relationship "ride", which in turn contributes to the confidence of the two objects. However, we argue that the visual context is not properly learned by using the prevailing cross-entropy based supervised learning paradigm, which is not sensitive to graph inconsistency: errors at the hub or non-hub nodes should not be penalized equally. To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach. CMAT is a multi-agent policy gradient method that frames objects into cooperative agents, and then directly maximizes a graph-level metric as the reward. In particular, to assign the reward properly to each agent, CMAT uses a counterfactual baseline that disentangles the agent-specific reward by fixing the predictions of other agents. Extensive validations on the challenging Visual Genome benchmark show that CMAT achieves a state-of-the-art performance by significant gains under various settings and metrics.

show abstract

Section: Discussionmentioning

confidence: 99%

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Chen

Zhang

Xiao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

143

105

View full text Add to dashboard Cite

show abstract

“…To provide insight in what parts are highlighted for predicate inference, we visualize the attended parts in the subjects and objects. Different from the previous attention visualization works [1,42], we further analyze the association weights of pairwise parts during inter-object interactive learning. Particularly, in the proposed visual mutual attention module, the part correlation matrix represents the association weights between subject and object parts, which are transferred into the part-aware attentive weights of individual objects.…”

Section: Effectiveness Of Intra-object Attention Mechanism (Q3)mentioning

confidence: 99%

Part-Aware Interactive Learning for Scene Graph Generation

Tian

Liu

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

Generating scene graph to describe the whereabouts and interactions of objects in an image has attracted increasing attention of researchers. Most existing methods explore object-level visual context or bodypart-object cooperation with the message passing structure, which can not meet the part-aware interaction nature of scene graph. Normally, a subject interacts with an object through crucial parts in each other. Besides, the correlation among parts within an identical object can also help predicting objects and their relationships. Hence, both of subject and object parts and their intra-and inter-object correlations should be fully considered for scene graph generation. In this paper, we propose a part-aware interactive learning method, which are divided into the intra-object and inter-object scenarios. First, we detect objects from an image and further decompose each one into a set of parts. Second, the part-aware graph attention module is proposed to refine part features via the intra-object message passing, and the refined features are incorporated for object inference. Third, the visual mutual attention module is designed to discover part-aware correlated visual cues precisely for predicate inference. It can highlight the subjectrelated object parts and the object-related subject parts during inter-object interactive learning. We demonstrate the superiority of our method against the state of the arts on Visual Genome. Ablation studies and visualization further validate its effectiveness. CCS CONCEPTS • Computing methodologies → Scene understanding.

show abstract

“…Liu et al [30] proposed a semisupervised Bayesian attribute learning framework to optimize representation learning and re-identification probability estimation. Distinguishable feature have also been widely studied in human action recognition and object recognition tasks [31]- [34].…”

Section: Related Workmentioning

confidence: 99%

Deep Feature Ranking for Person Re-Identification

et al. 2019

View full text Add to dashboard Cite

Person re-identification plays a critical part in many surveillance applications. Due to complicated illumination environments and various viewpoints, it is still a challenging problem to extract robust features. To solve this issue, we propose a novel deep feature ranking scheme. Our main contribution is to rank achieved deep features, which are obtained by classic deep learning model, and set the sort order number as our feature vector, named as ordinal deep features (ODFs). Person re-identification results are acquired by ranking person candidates by measuring distance based on ODFs. Since applying for rank orders rather than original feature values, our method achieves robust results, especially under the situation of viewpoints shift. Comprehensive experiments are carried out to demonstrate the significance of the proposed feature. Meanwhile, comparative experiments are applied over the publicly available dataset, our method achieves promising performance and outperforms the state of the art methods. Moreover, we applied the proposed feature in the scenario of image classification and discussed the effectiveness. INDEX TERMS Ordinal deep features, person re-identification, deep neural network, video surveillance.

show abstract

Dual-Stream Recurrent Neural Network for Video Captioning

Cited by 91 publications

References 83 publications

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Part-Aware Interactive Learning for Scene Graph Generation

Deep Feature Ranking for Person Re-Identification

Contact Info

Product

Resources

About