2022
DOI: 10.1016/j.neucom.2021.11.019
|View full text |Cite
|
Sign up to set email alerts
|

STCM-Net: A symmetrical one-stage network for temporal language localization in videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 7 publications
0
2
0
Order By: Relevance
“…In this sense, multimodal interaction is overlooked. To remedy, PLN [89], SMIN [53], CLEAR [90], and STCM-Net [91] disentangle video proposals into different temporal granularities [89], [91] or different semantic contents [53], [90], and perform crossmodal reasoning at both coarse-and fine-grained granularities. VLG-Net [92] and RaNet [54] maintain query words and video proposals in a graph, and adopt GCN [4], [93] to conduct intra-and inter-modal interactions for cross-modal reasoning.…”
Section: Temporal Adjacent Networkmentioning
confidence: 99%
“…In this sense, multimodal interaction is overlooked. To remedy, PLN [89], SMIN [53], CLEAR [90], and STCM-Net [91] disentangle video proposals into different temporal granularities [89], [91] or different semantic contents [53], [90], and perform crossmodal reasoning at both coarse-and fine-grained granularities. VLG-Net [92] and RaNet [54] maintain query words and video proposals in a graph, and adopt GCN [4], [93] to conduct intra-and inter-modal interactions for cross-modal reasoning.…”
Section: Temporal Adjacent Networkmentioning
confidence: 99%
“…In this sense, multimodal interaction is overlooked. To remedy, PLN [79], SMIN [57], CLEAR [80], and STCM-Net [81] disentangle video proposals into different temporal granularities [79,81] or different semantic contents [57,80], and perform cross-modal reasoning at both coarse-and fine-grained granularities. VLG-Net [82] and RaNet [58] maintain query words and video proposals in a graph, and adopt GCN [83,84] to conduct intra-and inter-modal interactions for cross-modal reasoning.…”
Section: D-mapmentioning
confidence: 99%