Numerous researches on content-based video indexing and retrieval besides video search engines are tied to a large-scaled video dataset. Unfortunately, reduction in open-sourced datasets resulted in complications for novel approaches exploration. Although, video datasets that index video files located on public video streaming services have other purposes, such as annotation, learning, classification, and other computer vision areas, with little interest in indexing public video links for purpose of searching and retrieval. This paper introduces a novel large-scaled dataset based on YouTube video links to evaluate the proposed content-based video search engine, gathered 1088 videos, that represent more than 65 hours of video, 11,000 video shots, and 66,000 unmarked and marked keyframes, 80 different object names used for marking. Moreover, a state-of-the-art features vector, and combinational-based matching, beneficial to the accuracy, speed, and precision of the video retrieval process. Any video record in the dataset is represented by three features: temporal combination vector, object combination vector with shot annotations, and 6 keyframes, sideways with other metadata. Video classification for the dataset was also imposed to expand the efficiency of retrieval of video-based queries. A two-phased approach has been used based on object and event classification, storing video records in aggregations related to feature vectors extracted. While object aggregation stores video records with the maximal occurrence of extracted object/concept from all shots, event aggregation classify based on groups according to the number of shots per video. This study indexed 58 out of 80 different object/concept categories, each has 9 shot number groups.
Many research studies in content-based video search engines are concerned with content-based video queries retrieval where a query by example is sent to retrieve a list of visually similar videos. However, minor research is concerned with indexing and searching public video streaming services such as YouTube, where there is a dilemma for misusing copyrighted video materials and detecting bootleg manipulated videos before being uploaded. In this paper, a novel and effective technique for a content-based video search engine with effective detection of bootleg videos is evaluated on a large-scale video index dataset of 1088 video records. A novel feature vector is introduced using video shots temporal and key-object/concept features applying combinational-based matching algorithms, using various similarity metrics for evaluation. The retrieval system was evaluated using more than 200 non-semantic-based video queries evaluating both normal and bootleg videos, with retrieval precision for normal videos of 97.9% and retrieval recall of 100% combined by the F1 measure to be 98.3%. Bootleg videos retrieval precision scored 99.2% and retrieval recall was of 96.7% combined by the F1 measure to be 97.9%. This allows making a conclusion that this technique can help in enhancing both traditional text-based search engines and commonly used bootleg detection techniques.
Content-based video search engines (CBVSE) are broadly needed in many mainstream video search engines retrieving videos from public video streaming services over the Internet such as YouTube. As they are mostly text-based search engines that index and retrieve videos depending on the surrounding text around the video web page that contains information representing this video file. This paper is an attempt to improve the performance of a previously developed technique for a content-based video search engine designed to firstly index videos on YouTube and search and retrieve videos using a non-semantic video query. Moreover, a large-scale dataset was indexed containing more than 1088 YouTube video records. Each record contains a feature vector of the temporal set, key-objects sets, and keyframes representing video shots in each video file in addition to the URLs and other information gathered from each video file web page.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.