Instance search retrospective with focus on TRECVID

Awad, George; Kraaij, Wessel; Over, Paul; Satoh, Shin’ichi

doi:10.1007/s13735-017-0121-3

Cited by 20 publications

(20 citation statements)

References 74 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…rough the main operations of these two techniques, video image data are analyzed and expressed to locate the position of the human target and use the data for subsequent classification and recognition [11]. ere are many similarities between human segmentation and tracking techniques, one of which is that both methods operate with targets from video image capture, and both video images contain commonly desired target behavior segments; both techniques have high requirements in terms of real-time, accuracy, and robustness of image processing.…”

Section: Related Workmentioning

confidence: 99%

[Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video

Liu

Wang

2021

Complexity

View full text Add to dashboard Cite

We analyze and study the tracking of nonrigid complex targets of sports video based on mean shift fusion color histogram algorithm. A simple and controllable 3D template generation method based on monocular video sequences is constructed, which is used as a preprocessing stage of dynamic target 3D reconstruction algorithm to achieve the construction of templates for a variety of complex objects, such as human faces and human hands, broadening the use of the reconstruction method. This stage requires video sequences of rigid moving target objects or sets of target images taken from different angles as input. First, the standard rigid body method of Visuals is used to obtain the external camera parameters of the sequence frames as well as the sparse feature point reconstruction data, and the algorithm has high accuracy and robustness. Then, a dense depth map is computed for each input image frame by the Multi-View Stereo algorithm. The depth reconstruction with a too high resolution not only increases the processing time significantly but also generates more noise, so the resolution of the depth map is controlled by parameters. The multiple hypothesis target tracking algorithms are used to track multiple targets, while the chunking feature is used to solve the problem of mutual occlusion and adhesion between targets. After finishing the matching, the target and background models are updated online separately to ensure the validity of the target and background models. Our results of nonrigid complex target tracking by mean shift fusion color histogram algorithm for sports video improve the accuracy by about 8% compared to other studies. The proposed tracking method based on the mean shift algorithm and color histogram algorithm can not only estimate the position of the target effectively but also depict the shape of the target well, which solves the problem that the nonrigid targets in sports video have complicated shapes and are not easy to track. An example is given to demonstrate the effectiveness and adaptiveness of the applied method.

show abstract

Section: Related Workmentioning

confidence: 99%

[Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video

Liu

Wang

2021

Complexity

View full text Add to dashboard Cite

show abstract

“…Instance search was addressed as a sub-image retrieval task before CNNs were introduced [1] to visual object detection. The main image features being employed for this task are hand-crafted local descriptors such as SIFT and SURF.…”

Section: Related Work 21 Instance Searchmentioning

confidence: 99%

“…When instance search was irst addressed in [1], the problem was coined as a sub-image retrieval task. Handcrafted features such as SIFT [22] and SURF [3] that are superior in local image matching were de-facto descriptors at that time.…”

Section: Introductionmentioning

confidence: 99%

“…Handcrafted features such as SIFT [22] and SURF [3] that are superior in local image matching were de-facto descriptors at that time. Although encouraging results are reported [1], these approaches are known to be limited to match textureless image patches and instances undergone non-rigid motions. While most of the descriptors are capable of generating thousands of local features from an image for matching, these features are extracted from regions rich of textures or corners.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deeply Activated Salient Region for Instance Search

Xiao

Zhao

Jie

et al. 2022

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

The performance of instance search relies heavily on the ability to locate and describe a wide variety of object instances in a video/image collection. Due to the lack of a proper mechanism for locating instances and deriving feature representation, instance search is generally only effective when the instances are from known object categories. In this paper, a simple but effective instance-level feature representation approach is presented. Different from the existing approaches, the issues of class-agnostic instance localization and distinctive feature representation are considered. The former is achieved by detecting salient instance regions from an image by a layer-wise back-propagation process. The back-propagation starts from the last convolution layer of a pre-trained CNNs that is originally used for classification. The back-propagation proceeds layer-by-layer until it reaches the input layer. This allows the salient instance regions in the input image from both known and unknown categories to be activated. Each activated salient region covers the full, or more usually, a major range of an instance. The distinctive feature representation is produced by average-pooling on the feature map of a certain layer with the detected instance region. Experiments show that this kind of feature representation demonstrates considerably better performance than most of the existing approaches.

show abstract

“…Video summarization techniques create automatic video summaries by meeting three requirements: The presence of relevant video entities and events, elimination of redundant information, and generation of as much useful information as possible (Truong & Venkatesh, 2007). Truong & Venkatesh (2007) describe some video summarization applications such as browsing and retrieval, which is responsible for assisting users on searching and browsing tasks (Awad et al, 2017b;Arman et al, 1994;Zhang et al, 1997;Haojin Yang & Meinel, 2014), computational reduction and content analysis, used on semantic abstraction of information to reduce the computational complexity (Plummer et al, 2017), story navigation and video editing, which help users on navigating through a video (Nguyen et al, 2012), and highlighting, targeted on detection of important events in videos (Yao et al, 2016;Gygli et al, 2014;Xiong et al, 2003). On each of these applications, video summarization techniques try to mimic the ways humans comprehend the most important parts of a video.…”

Section: Introductionmentioning

confidence: 99%

Modelling perceptions on the evaluation of video summarization

Abdalla

Menezes

Oliveira

2019

Expert Systems with Applications

View full text Add to dashboard Cite

Hours of video are uploaded to streaming platforms every minute, with recommender systems suggesting popular and relevant videos that can help users save time in the searching process. Recommender systems regularly require video summarization as an expert system to automatically identify suitable video entities and events. Since there is no well-established methodology to evaluate the relevance of summarized videos, some studies have made use of user annotations to gather evidence about the effectiveness of summarization methods. Aimed at modelling the user's perceptions, which ultimately form the basis for testing video summarization systems, this paper seeks to propose: (i) A guideline to collect unrestricted user annotations, (ii) a novel metric called compression level of user annotation (CLUSA) to gauge the performance of video summarization methods, and (iii) a study on the quality of annotated video summaries collected from different assessment scales. These contributions lead to benchmarking video summarization methods with no constraints, even if user annotations are collected from different assessment scales for each method. Our

show abstract

Instance search retrospective with focus on TRECVID

Cited by 20 publications

References 74 publications

[Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video

[Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video

Deeply Activated Salient Region for Instance Search

Modelling perceptions on the evaluation of video summarization

Contact Info

Product

Resources

About