2019
DOI: 10.1109/tcsvt.2018.2867286
|View full text |Cite
|
Sign up to set email alerts
|

Dual-Stream Recurrent Neural Network for Video Captioning

Abstract: News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By "understand", we mean correlating the news content with common sense in the wild, which helps an agent to 1) distinguish semantically similar named entities and 2) describe named entities using words outside of training corpora. O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 91 publications
(27 citation statements)
references
References 83 publications
0
27
0
Order By: Relevance
“…We validated the effectiveness of CMAT through extensive comparative and ablative experiments. Moving forward, we are going to 1) design a more effective graph-level metric to guide the CMAT training and 2) apply CMAT in downstream tasks such as VQA [61,70], dialog [41], and captioning [64].…”
Section: Discussionmentioning
confidence: 99%
“…We validated the effectiveness of CMAT through extensive comparative and ablative experiments. Moving forward, we are going to 1) design a more effective graph-level metric to guide the CMAT training and 2) apply CMAT in downstream tasks such as VQA [61,70], dialog [41], and captioning [64].…”
Section: Discussionmentioning
confidence: 99%
“…To provide insight in what parts are highlighted for predicate inference, we visualize the attended parts in the subjects and objects. Different from the previous attention visualization works [1,42], we further analyze the association weights of pairwise parts during inter-object interactive learning. Particularly, in the proposed visual mutual attention module, the part correlation matrix represents the association weights between subject and object parts, which are transferred into the part-aware attentive weights of individual objects.…”
Section: Effectiveness Of Intra-object Attention Mechanism (Q3)mentioning
confidence: 99%
“…Liu et al [30] proposed a semisupervised Bayesian attribute learning framework to optimize representation learning and re-identification probability estimation. Distinguishable feature have also been widely studied in human action recognition and object recognition tasks [31]- [34].…”
Section: Related Workmentioning
confidence: 99%