Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475301
|View full text |Cite
|
Sign up to set email alerts
|

Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

Abstract: With the explosive growth of web videos in recent years, large-scale Content-Based Video Retrieval (CBVR) becomes increasingly essential in video filtering, recommendation, and copyright protection. Segment-level CBVR (S-CBVR) locates the start and end time of similar segments in finer granularity, which is beneficial for user browsing efficiency and infringement detection especially in long video scenarios. The challenge of S-CBVR task is how to achieve high temporal alignment accuracy with efficient computat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(23 citation statements)
references
References 34 publications
0
13
0
Order By: Relevance
“…Inspired by temporal matching kernel [27], LAMV [7] transforms the kernel into a differentiable layer to find temporal alignments. SPD [8] formulates temporal alignment as an object detection task on the frame-to-frame similarity matrix, achieving a state-of-theart segment-level copy detection performance.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Inspired by temporal matching kernel [27], LAMV [7] transforms the kernel into a differentiable layer to find temporal alignments. SPD [8] formulates temporal alignment as an object detection task on the frame-to-frame similarity matrix, achieving a state-of-theart segment-level copy detection performance.…”
Section: Methodsmentioning
confidence: 99%
“…Previous segment-level evaluation metrics are introduced with MUSCLE-VCD [15] and VCDB datasets [11]. Most of recent research works [7][8][9] adopt segment precision and recall defined in VCDB as follows:…”
Section: Datasets and Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…The task of Image-to-Video Retrieval (IVR) [191]- [194] localizes video segments that contain similar activity as in a query image. Similarly, given a query video and a reference video, video re-localization (VRL) [195]- [198] localizes a segment in the reference video that semantically corresponds to the query video. Conceptually, the query is in the form of audio in AVEL, appearance vision in IVR, and motion vision in VRL, respectively.…”
Section: Multi-modal Temporal Grounding In Videomentioning
confidence: 99%