This paper proposes a method to robustly monitor shelves in retail stores using supervised learning for improving on-shelf availability. To ensure high on-shelf availability, which is a key factor for improving profits in retail stores, we focus on understanding changes in products regarding increases/decreases in product amounts on the shelves. Our method first detects changed regions of products in an image by using background subtraction followed by moving object removal. It then classifies the detected change regions into several classes representing the actual changes on the shelves, such as “product taken (decrease)” and “product replenished/returned (increase)”, by supervised learning using convolutional neural networks. It finally updates the shelf condition representing the presence/absence of products using classification results and computes the product amount visible in the image as on-shelf availability using the updated shelf condition. Three experiments were conducted using two videos captured from a surveillance camera on the ceiling in a real store. Results of the first and second experiments show the effectiveness of the product change classification in our method. Results of the third experiment show that our method achieves a success rate of 89.6% for on-shelf availability when an error margin is within one product. With high accuracy, store clerks can maintain high on-shelf availability, enabling retail stores to increase profits.
This paper proposes an image signature robust to caption superimposition for video sequence identification. A new image signature which is a set of local features is developed for a high-speed frame-by-frame matching of video sequences. The signature of a frame is obtained by partitioning the image into blocks and extracting the local feature representing the dominant type of edge direction from each block. The similarity between the signatures is calculated by comparing the edge types of the corresponding blocks, and counting the number of the blocks having the same edge type. A weighting scheme based on the probability of caption superimposition for each block can be applied to the similarity calculation to improve the matching performance. The experimental results of the video sequence identification show that the proposed signature achieves precision of 99.65% and recall of 99.45%, improving both the precision and the recall by more than 30% compared with the conventional signature.
Abstract-This paper presents the core technologies of the Video Signature Tools recently standardized by ISO/IEC MPEG as an amendment to the MPEG-7 Standard (ISO/IEC 15938). The Video Signature is a high-performance content fingerprint which is suitable for desktop-scale to web-scale deployment and provides high levels of robustness to common video editing operations and high temporal localization accuracy at extremely low false alarm rates, achieving a detection rate in the order of 96% at a false alarm rate in the order of five false matches per million comparisons. The applications of the Video Signature are numerous and include rights management and monetization, distribution management, usage monitoring, metadata association, and corporate or personal database management. In this paper we review the prior work in the field, explain the standardization process and status, and provide details and evaluation results for the Video Signature Tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.