Panorama video is becoming increasingly popular, and we present an end-to-end real-time system to interactively zoom and pan into high-resolution panoramic videos. Compared to existing systems using perspective panoramas with cropping, our approach creates a cylindrical panorama. Here, the perspective is corrected in real-time, and the result is a better and more natural zoom. Our experimental results also indicate that such zoomed virtual views can be generated far below the frame-rate threshold. Taking into account recent trends in device development, our approach should be able to scale to a large number of concurrent users in the near future.
We introduce a new algorithm for image segmentation based on crowdsourcing through a game : Ask'nSeek. The game provides information on the objects of an image, under the form of clicks that are either on the object, or on the background. These logs are then used in order to determine the best segmentation for an object among a set of candidates generated by the state-of-the-art CPMC algorithm. We also introduce a simulator that allows the generation of game logs and therefore gives insight about the number of games needed on an image to perform acceptable segmentation.
Categories and Subject Descriptors
KeywordsCrowdsourcing, figure-ground segmentation, labeling game, human computing
MOTIVATIONSemantic annotation of visual content is a process that requires linking pixels within an image with the semantic concepts associated to each group of pixels. This is often done in a user-assisted way. There exist several levels of interaction between users and visual content, ranging from an intentional and accurate annotation targeted at generating high-quality labels (e.g., LabelMe [11]), to a completely unintentional process which is more bound to be noisy (e.g., Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. textual contents surrounding images on web pages and multimedia documents).This paper proposes a method for semantic object segmentation, at a pixel level, based on high-quality annotation data collected from an online game: Ask'nSeek [1]. Ask'nSeek is a two-player game where one player (the master ) places a small square on the image to be segmented, a square which is invisible to the other player (the seeker ). The goal of the seeker is to click inside the hidden square, i.e., to guess the square's location. The seeker is required to solve the problem by requesting clues to the master that consist of the relative position of the square with respect to a semantic object within the image. The name of the object is typed in by the seeker, thereby providing a semantic label for the clicks.Our approach utilizes the information collected from game logs to seed a semi-supervised image segmentation algorithm, which will eventually extract the objects in the image from the surrounding background. Contrary to most existing human-assisted image segmentation solutions, the proposed process is unintentional from the user side, who is engaged in an online game that is scalable to crowds and whose objective is not image segmentation.While the previous work on Ask'nSeek [1] aimed at solving the obje...
Object recognition based on local features computed at multiple locations is robust to occlusions, strong viewpoint changes and object deformations. These features should be repeatable, precise and distinctive. We present an operator for repeatable feature detection on depth images (relative to 3D models) as well as 2D intensity images. The proposed detector is based on estimating the curviness saliency at multiple scales in each kind of image. We also propose quality measures that evaluate the repeatability of the features between depth and intensity images. The experiments show that the proposed detector outperforms both the most powerful, classical point detectors (e.g., SIFT) and edge detection techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.