VISIONE at Video Browser Showdown 2022

Amato, Giuseppe; Bolettieri, Paolo; Carrara, Fabio; Falchi, Fabrizio; Gennaro, Claudio; Messina, Nicola; Vadicamo, Lucia; Vairo, Claudio

doi:10.1007/978-3-030-98355-0_52

Cited by 10 publications

(10 citation statements)

References 24 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2 https://lucene.apache.org/ 3 https://github.com/facebookresearch/faiss 4 We leave the investigation of a STR technique that is suitable for indexing this type of dense vector to future work. Specifically, it implements object queries by placing the desired objects or colors in a canvas, it allows video searching by specifying natural language descriptions of desired keyframes or shots, and it supports temporal queries for finding consecutive specific events.…”

Section: Discussionmentioning

confidence: 99%

“…Therefore, for the CLIP2Video features, the approximated cosine similarity computed in the STR representation badly approximates the original one. For these reasons, for the CLIP-based features, we instead relied on the FAISS index, using an exact search and an 8-bit scalar quantization to reduce the index size in memory 4 . Despite the exact search, with the in-memory quantized index, the search over the full V3C1 + V3C2 shots takes only a few milliseconds at a cost of much bigger memory utilization.…”

Section: Indexingmentioning

confidence: 99%

“…This demonstration paper presents the latest release of VI-SIONE [1,2,[4][5][6], an interactive large-scale video search system.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

VISIONE: A Large-Scale Video Retrieval System with Advanced Search Functionalities

Amato

Bolettieri

Carrara

et al. 2023

Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

VISIONE is a large-scale video retrieval system that integrates multiple search functionalities, including free text search, spatial color and object search, visual and semantic similarity search, and temporal search. The system leverages cutting-edge AI technology for visual analysis and advanced indexing techniques to ensure scalability. As demonstrated by its runner-up position in the 2023 Video Browser Showdown competition, VISIONE effectively integrates these capabilities to provide a comprehensive video retrieval solution. A system demo is available online, showcasing its capabilities on over 2300 hours of diverse video content (V3C1+V3C2 dataset) and 12 hours of highly redundant content (Marine dataset). The demo can be accessed at https://visione.isti.cnr.it/.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Indexingmentioning

confidence: 99%

See 1 more Smart Citation

VISIONE: A Large-Scale Video Retrieval System with Advanced Search Functionalities

Amato

Bolettieri

Carrara

et al. 2023

Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

show abstract

“…[99] VISIONE [31] OpenCLIP ViT-L/14 trained with LAION-400m [60] diveXplore [100] OpenCLIP ViT-B/32 trained with LAION-2B [12], [60] 4MR [34] OpenCLIP ViT-B/32 xlm roberta base model trained with LAION-5B [13], [60] vitrivr [96] vitrivr-VR [107] CLIP [5], [86] CVHunter [71] vitrivr [96] vitrivr-VR [107] CLIP2Video [6], [45] VISIONE [31] BLIP [3], [66] QIVISE [103] CLIP4Clip [7], [77] VIREO [79] Custom cross-modal network [20], [46] combining multiple textual and visual features and employing OpenCLIP ViT-B/32 [60], [86], ResNet-152 [53], and ResNeXt-101 [80] Verge [84] ITV [116] VIREO [79] ALADIN [2], [81] VISIONE [31] custom model [24], [105] vitrivr [96] vitrivr-VR [107] The VBS systems have greatly evolved in recent years, offering innovative approaches to efficiently explore and retrieve information from large video collections. Almost all these systems exploit joint text-visual embeddings to enhance the search experience and provide more accurate results.…”

Section: Model Systemmentioning

confidence: 99%

Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS Competition

Vadicamo,

Arnold,

Bailer

et al. 2024

IEEE Access

View full text Add to dashboard Cite

This paper conducts a thorough examination of the 12th Video Browser Showdown (VBS) competition, which is a well-established international benchmarking campaign for interactive video search systems. The annual VBS competition has witnessed a steep rise in the popularity of multimodal embedding-based approaches in interactive video retrieval. The majority of the thirteen systems participating in VBS 2023 utilized a CLIP-based cross-modal search model, allowing the specification of free-form text queries to search visual content. This shared emphasis on joint embedding models contributed to balanced performance across various teams. However, the distinguishing factors of the top-performing teams included the adept combination of multiple models and search modes, along with the capabilities of interactive interfaces to facilitate and refine the search process. Our work provides an overview of the state-of-the-art approaches employed by the participating systems and conducts a thorough analysis of their search logs, which record user interactions and results of their queries for each task. Our comprehensive examination of the VBS competition offers assessments of the effectiveness of the retrieval models employed, the browsing efficiency, and user query patterns. Additionally, it provides valuable insights into the evolving landscape of interactive video retrieval and its future challenges.

show abstract

“…STR-based methods, on the other hand, rely on transformations that sparsify data and encode it as small sets of codewords indexed on standard text engines [9,2,4]. These approaches are successfully used to solve multimodal queries for combined text search with image similarity [1,3].…”

Section: Introductionmentioning

confidence: 99%

Approximate Nearest Neighbor Search on Standard Search Engines

Carrara

Vadicamo²,

Gennaro³

et al. 2022

Similarity Search and Applications

Self Cite

View full text Add to dashboard Cite

Approximate search for high-dimensional vectors is commonly addressed using dedicated techniques often combined with hardware acceleration provided by GPUs, FPGAs, and other custom in-memory silicon. Despite their effectiveness, harmonizing those optimized solutions with other types of searches often poses technological difficulties. For example, to implement a combined text+image multimodal search, we are forced first to query the index of high-dimensional image descriptors and then filter the results based on the textual query or vice versa. This paper proposes a text surrogate technique to translate real-valued vectors into text and index them with a standard textual search engine such as Elasticsearch or Apache Lucene. This technique allows us to perform approximate kNN searches of high-dimensional vectors alongside classical full-text searches natively on a single textual search engine, enabling multimedia queries without sacrificing scalability. Our proposal exploits a combination of vector quantization and scalar quantization. We compared our approach to the existing literature in this field of research, demonstrating a significant improvement in performance through preliminary experimentation.

show abstract

VISIONE at Video Browser Showdown 2022

Cited by 10 publications

References 24 publications

VISIONE: A Large-Scale Video Retrieval System with Advanced Search Functionalities

VISIONE: A Large-Scale Video Retrieval System with Advanced Search Functionalities

Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS Competition

Approximate Nearest Neighbor Search on Standard Search Engines

Contact Info

Product

Resources

About