Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

Jiang, Chen; Huang, Kaiming; He, Sifeng; Yang, Xudong; Zhang, Wei; Zhang, Xiaobo; Cheng, Yuan; Yang, Lei; Wang, Qing; Xu, Furong; Pan, Tan; Chu, Wei-Ta

doi:10.1145/3474085.3475301

Cited by 18 publications

(23 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inspired by temporal matching kernel [27], LAMV [7] transforms the kernel into a differentiable layer to find temporal alignments. SPD [8] formulates temporal alignment as an object detection task on the frame-to-frame similarity matrix, achieving a state-of-theart segment-level copy detection performance.…”

Section: Methodsmentioning

confidence: 99%

“…Previous segment-level evaluation metrics are introduced with MUSCLE-VCD [15] and VCDB datasets [11]. Most of recent research works [7][8][9] adopt segment precision and recall defined in VCDB as follows:…”

Section: Datasets and Evaluationmentioning

confidence: 99%

“…In most cases, video-level copy detection results alone are not sufficient as the detected videos are usually displayed and interacted with system users for downstream tasks. Hence, designing an approach that can locate the copied segments is preferred and has already attracted lots of attentions in recent works [7][8][9][10][11].…”

Section: Introductionmentioning

confidence: 99%

“…Meanwhile, existing evaluation protocols for segmentlevel video copy detection exhibit an obvious drawback that most of them utilize ground-truth copied segments as queries rather than the entire videos [7,8,11]. This is unpractical for real copy detection scenario where it is hard to know a priori that which part of a video will be pirated.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

He¹,

Yang²,

Jiang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segment pairs, but also covers a variety of video categories and a wide range of video duration. All the copied segments inside each collected video pair are manually extracted and accompanied by precisely annotated starting and ending timestamps. Alongside the dataset, we also propose a novel evaluation protocol that better measures the prediction accuracy of copy overlapping segments between a video pair and shows improved adaptability in different scenarios. By benchmarking several baseline and state-of-the-art segment-level video copy detection methods with the proposed dataset and evaluation metric, we provide a comprehensive analysis that uncovers the strengths and weaknesses of current approaches, hoping to open up promising directions for future works. The VCSL dataset, metric and benchmark codes are all publicly available at https://github.com/alipay/VCSL.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Datasets and Evaluationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

He¹,

Yang²,

Jiang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The task of Image-to-Video Retrieval (IVR) [191]- [194] localizes video segments that contain similar activity as in a query image. Similarly, given a query video and a reference video, video re-localization (VRL) [195]- [198] localizes a segment in the reference video that semantically corresponds to the query video. Conceptually, the query is in the form of audio in AVEL, appearance vision in IVR, and motion vision in VRL, respectively.…”

Section: Multi-modal Temporal Grounding In Videomentioning

confidence: 99%

The Elements of Temporal Sentence Grounding in Videos: A Survey and Future Directions

Zhang¹,

Sun²,

Wei³

et al. 2022

Preprint

View full text Add to dashboard Cite

Temporal sentence grounding in videos (TSGV), a.k.a., natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, TSGV has drawn significant attention from researchers in both communities. This survey attempts to provide a summary of fundamental concepts in TSGV and current research status, as well as future research directions. As the background, we present a common structure of functional components in TSGV, in a tutorial style: from feature extraction from raw video and language query, to answer prediction of the target moment. Then we review the techniques for multimodal understanding and interaction, which is the key focus of TSGV for effective alignment between the two modalities. We construct a taxonomy of TSGV techniques and elaborate methods in different categories with their strengths and weaknesses. Lastly, we discuss issues with the current TSGV research and share our insights about promising research directions.

show abstract

Learning Video Localization on Segment-Level Video Copy Detection with Transformer

Zhang,

Liu,

Zhang

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

Cited by 18 publications

References 34 publications

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

The Elements of Temporal Sentence Grounding in Videos: A Survey and Future Directions

Learning Video Localization on Segment-Level Video Copy Detection with Transformer

Contact Info

Product

Resources

About