We describe a method for using crowd-sourced labor to track motion and ultimately annotate gestures of humans in video. Our chosen platform for deployment, Amazon Mechanical Turk, divides labor into HITs (Human Intelligence Tasks). Given the informational density of video, our task is potentially larger than a traditional HIT that involves processing a block of text or a single image. We exploit redundancies in video data in such a way that workers' efforts can be multiplied in effect. In the end, a fraction of frames need to be annotated by hand, but we can still achieve complete coverage of all video frames. This is achieved with a combination of HITs using a novel user interface, combined with automatic techniques such as template tracking and affinity propagation clustering. We show in a case study how we can annotate a video database of political speeches with 2D positions and 3D hand pose configurations. This data is then used for some preliminary analytical tasks.
This paper describes the development and preliminary design of a game with a purpose that attempts to build a corpus of useful and original videos of human motion. This content is intended for use in applications of machine learning and computer vision. The game, Motion Chain, encourages users to respond to text and video prompts by recording videos with a web camera. The game seeks to entertain not through an explicit achievement or point system but through the fun of performance and the discovery inherent in observing other players. This paper describes two specific forms of the game, Chains and Charades, and proposes future possibilities. The paper describes the phases of game design as well as implementation details then discusses an approach for evaluating the game's effectiveness.
This paper demonstrates how 3D skeletal reconstruction can be performed by using a pose-sensitive embedding technique applied to multi-view video recordings. We apply our approach to challenging low-resolution video sequences. Usually skeletal reconstruction can be only achieved with many calibrated high-resolution cameras, and only blob detection can be achieved with such low-resolution imagery. We show that with this embedding technique (a metric learning technique using a deep convolutional architecture), we achieve very good 3D skeletal reconstruction on low-resolution outdoor scenes with many challenges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.