2021
DOI: 10.1109/tip.2021.3113791
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(12 citation statements)
references
References 51 publications
0
10
0
Order By: Relevance
“…Subsequent work generally follows the strategies of TGN or SCDM with more sophisticated learning modules and/or auxiliary objectives. To be specific, CMIN [50], [78], CBP [79], FIAN [80], HDRR [81], and MIGCN [82] adopt the strategy of TGN, while CSMGAN [83], RMN [84], IA-Net [85], and DCT-Net [86] apply the strategy of SCDM. These solutions design various cross-modal reasoning strategies to perform more fine-grained and deeper multi-modal interaction between video and query, for precise moment localization.…”
Section: Anchor-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Subsequent work generally follows the strategies of TGN or SCDM with more sophisticated learning modules and/or auxiliary objectives. To be specific, CMIN [50], [78], CBP [79], FIAN [80], HDRR [81], and MIGCN [82] adopt the strategy of TGN, while CSMGAN [83], RMN [84], IA-Net [85], and DCT-Net [86] apply the strategy of SCDM. These solutions design various cross-modal reasoning strategies to perform more fine-grained and deeper multi-modal interaction between video and query, for precise moment localization.…”
Section: Anchor-based Methodsmentioning
confidence: 99%
“…Some other work adopt boundary regression module to refine the start and end time points of generated moments. MIGCN [82] develops a rank module apart from boundary regression module to distinguish the optimal proposal from a set of similar proposal candidates. 2D-Map Anchor-based Method.…”
Section: Anchor-based Methodsmentioning
confidence: 99%
“…TGN [2] temporally captures the evolving fine-grained frame-by-word interactions and uses pre-set anchors to produce multi-scale proposal candidates ending at each time step. Subsequently [15,21,33,34] follow the anchor-based framework and propose various multi-modal reasoning strategies to achieve precise moment localization. In addition, 2D-TAN [32] enumerate all possible segments as proposal candidates and convert them into 2D feature map, then a temporal adjacent network is proposed to obtain multi-modal representation and encode the video context information.…”
Section: Short-form Video Temporal Groundingmentioning
confidence: 99%
“…HDRR [71], and MIGCN [72] adopt the strategy of TGN, while CSMGAN [73], RMN [74], IA-Net [75], and DCT-Net [76] apply the strategy of SCDM. These solutions design various crossmodal reasoning strategies to perform more fine-grained and deeper multi-modal interaction between video and query, for precise moment localization.…”
Section: Temporal Adjacent Networkmentioning
confidence: 99%
“…Some other works adopt boundary regression module to refine the start and end timestamps of generated moments. MIGCN [72] develops a rank module apart from the boundary regression module to distinguish the optimal proposal from a set of similar proposal candidates. Before 2D-Map methods, a prior work TMN [77] first proposes to enumerate all possible consecutive segments as proposals and predict the best-matched proposal as result through interacting each proposal with query.…”
Section: Temporal Adjacent Networkmentioning
confidence: 99%