We introduce the publicly available TUM Kitchen Data Set as a comprehensive collection of activity sequences recorded in a kitchen environment equipped with multiple complementary sensors. The recorded data consists of observations of naturally performed manipulation tasks as encountered in everyday activities of human life. Several instances of a table-setting task were performed by different subjects, involving the manipulation of objects and the environment. We provide the original video sequences, fullbody motion capture data recorded by a markerless motion tracker, RFID tag readings and magnetic sensor readings from objects and the environment, as well as corresponding action labels. In this paper, we both describe how the data was computed, in particular the motion tracker and the labeling, and give examples what it can be used for. We present first results of an automatic method for segmenting the observed motions into semantic classes, and describe how the data can be integrated in a knowledge-based framework for reasoning about the observations.
We present a markerless tracking system for unconstrained human motions which are typical for everyday manipulation tasks. Our system is capable of tracking a highdimensional human model (51 DOF) without constricting the type of motion and the need for training sequences. The system reliably tracks humans that frequently interact with the environment, that manipulate objects, and that can be partially occluded by the environment.We describe and discuss two key components that substantially contribute to the accuracy and reliability of the system. First, a sophisticated hierarchical sampling strategy for recursive Bayesian estimation that combines partitioning with annealing strategies to enable efficient search in the presence of many local maxima. Second, a simple yet effective appearance model that allows for the combination of shape and appearance masks to implicitly deal with two cases of environmental occlusions by (1) subtracting dynamic non-human objects from the region of interest and (2) modeling objects (e.g. tables) that both occlude and can be occluded by human subjects. The appearance model is based on bit representations that makes our algorithm well suited for implementation on highly parallel hardware such as commodity GPUs.Extensive evaluations on the HumanEva2 benchmarks show the potential of our method when compared to stateof-the-art Bayesian techniques. Besides the HumanEva2 benchmarks, we present results on more challenging sequences, including table setting tasks in a kitchen environment and persons getting into and out of a car mock-up.
This paper introduces the Assistive Kitchen as a comprehensive demonstration and challenge scenario for technical cognitive systems. We describe its hardware and software infrastructure. Within the Assistive Kitchen application, we select particular domain activities as research subjects and identify the cognitive capabilities needed for perceiving, interpreting, analyzing, and executing these activities as research foci. We conclude by outlining open research issues that need to be solved to realize the scenarios successfully.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.