V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022

Tran, Minh–Triet; Hoang-Xuan, Nhat; Trang-Trung, Hoang-Phuc; Le, Thanh-Cong; Tran, Mai-Khiem; Le, Minh-Quan; Le, Tu-Khiem; Ninh, Van-Tu; Gurrin, Cathal

doi:10.1007/978-3-030-98355-0_55

Cited by 7 publications

(7 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, vibro [99] employs the OpenCLIP ViT-L/14 [14], [60] trained on LAION-2B [101] to produce joint text-visual embeddings. VideoCLIP [82] and v-FIRST [110] uses the visual transformer CLIP ViT-L@336 [15], [60], [86] trained on the LAION-2B dataset. In VideoCLIP, the integration of Milvus [113] vector database facilitates seamless matching between embeddings.…”

Section: Model Systemmentioning

confidence: 99%

“…In VideoCLIP, the integration of Milvus [113] vector database facilitates seamless matching between embeddings. v-FIRST [59] presents a revised version of their previous interactive video retrieval system [110], which supports querying by textual descriptions and visual examples. The joint textvisual feature space is the basis for many of v-FIRST's functionalities, such as optimized vector search, fast neighbor search, and compression of similar video segments.…”

Section: Model Systemmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS Competition

Vadicamo,

Arnold,

Bailer

et al. 2024

IEEE Access

View full text Add to dashboard Cite

This paper conducts a thorough examination of the 12th Video Browser Showdown (VBS) competition, which is a well-established international benchmarking campaign for interactive video search systems. The annual VBS competition has witnessed a steep rise in the popularity of multimodal embedding-based approaches in interactive video retrieval. The majority of the thirteen systems participating in VBS 2023 utilized a CLIP-based cross-modal search model, allowing the specification of free-form text queries to search visual content. This shared emphasis on joint embedding models contributed to balanced performance across various teams. However, the distinguishing factors of the top-performing teams included the adept combination of multiple models and search modes, along with the capabilities of interactive interfaces to facilitate and refine the search process. Our work provides an overview of the state-of-the-art approaches employed by the participating systems and conducts a thorough analysis of their search logs, which record user interactions and results of their queries for each task. Our comprehensive examination of the VBS competition offers assessments of the effectiveness of the retrieval models employed, the browsing efficiency, and user query patterns. Additionally, it provides valuable insights into the evolving landscape of interactive video retrieval and its future challenges.

show abstract

Section: Model Systemmentioning

confidence: 99%

Section: Model Systemmentioning

confidence: 99%

Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS Competition

Vadicamo,

Arnold,

Bailer

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…For additional detail about any system, please see the corresponding publication referenced beside the system name in the overview table. [42] KR 249 25 2 ✓ ✓ ✓ ✓ ✓ ✓ AVSEEKER [41] IE 207 25 2 [75] VN 200 26 2 ✓ ✓ ✓ ✓ ✓ ✓ VideoFall [60] IE 197 25 2 ✓ ✓ ✓ ✓ ✓ ✓ ✓ VERGE [6] GR 176 24 2 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ vitrivr [28] CH 175 21 2 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ VNUHCM [54] VN 161 22 2 ✓ ✓ ✓ ✓ VIREO [55] SG 158 16 2 [34] VN 146 22 2 ✓ ✓ ✓ ✓ ✓ vitrivr-VR [71] CH 137 20 2 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ diveXplore [43] AT 75 14 2 ✓ ✓ ✓ ✓ ✓ Exquisitor [40] DK 72 14 1…”

Section: Related Work Used By Participating Systemsmentioning

confidence: 99%

“…V-FIRST [75] simply allows the user to input two separate queries, then uses a weighted sum of the two queries to generate ordered pairs of images in a video and return them for the user to browse.…”

Section: Temporal Queryingmentioning

confidence: 99%

Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS

Lokoč,

Andreadis,

Bailer

et al. 2023

Multimedia Systems

Self Cite

View full text Add to dashboard Cite

This paper presents the findings of the eleventh Video Browser Showdown competition, where sixteen teams competed in known-item and ad-hoc search tasks. Many of the teams utilized state-of-the-art video retrieval approaches that demonstrated high effectiveness in challenging search scenarios. In the paper, a broad survey of all utilized approaches is presented in connection with an analysis of the performance of participating teams. Specifically, both high-level performance indicators are presented with overall statistics as well as an in-depth analysis of the performance of selected tools implementing result set logging. The analysis reveals evidence that the CLIP model represents a versatile tool for cross-modal video retrieval when combined with interactive search capabilities. Furthermore, the analysis investigates the effect of different users and text query properties on the performance in search tasks. Last but not least, lessons learned from search task preparation are presented, and a new direction for adhoc search based tasks at Video Browser Showdown is introduced.

show abstract

“…We propose a new idea for query expansion with the assistance of external search engine to find unknown/unfamiliar concepts (see Section 3.5). We also provide a simple sketch-based retrieval [23] so that users can quickly sketch out the scene of interest.…”

Section: System Overviewmentioning

confidence: 99%

Flexible Interactive Retrieval SysTem 3.0 for Visual Lifelog Exploration at LSC 2022

Hoang-Xuan

Trang-Trung

Nguyen

et al. 2022

Proceedings of the 5th Annual on Lifelog Search Challenge

Self Cite

View full text Add to dashboard Cite

Building a retrieval system with lifelogging data is more complicated than with ordinary data due to the redundancies, blurriness, massive amount of data, various sources of information accompanying lifelogging data, and especially the ad-hoc nature of queries. The Lifelog Search Challenge (LSC) is a benchmarking challenge that encourages researchers and developers to push the boundaries in lifelog retrieval. For LSC'22, we develop FIRST 3.0, a novel and flexible system that leverages expressive cross-domain embeddings to enhance the searching process. Our system aims to adaptively capture the semantics of an image at different levels of detail. We also propose to augment our system with an external search engine to help our system with initial visual examples for unfamiliar concepts. Finally, we organize image data in hierarchical clusters based on their visual similarity and location to assist users in data exploration. Experiments show that our system is both fast and effective in handling various retrieval scenarios. CCS CONCEPTS• Information systems → Search interfaces; Multimedia databases; • Human-centered computing → Interactive systems and tools.

show abstract

V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022

Cited by 7 publications

References 16 publications

Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS Competition

Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS Competition

Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS

Flexible Interactive Retrieval SysTem 3.0 for Visual Lifelog Exploration at LSC 2022

Contact Info

Product

Resources

About