This paper proposes new features extracted from images derived from optical flow, for first-person activity recognition. Features from convolutional neural network (CNN), which is designed for 2D images, attract attention from computer vision researchers due to its powerful discrimination capability, and recently a convolutional neural network for videos, called C3D (Convolutional 3D), was proposed. Generally CNN / C3D features are extracted directly from original images / videos with pre-trained convolutional neural network, since the network was trained with images / videos. In this paper, on the other hand, we propose the use of images derived from optical flow (we call this image as "optical flow image") as input images into the pre-trained neural network, based on the following reasons; (i) optical flow images give dynamic information which is useful for activity recognition, compared with original images, which give only static information, and (ii) the pre-trained network has chance to extract features with reasonable discrimination capability, since the network was trained with huge amount of images from big categories. We carry out experiments with a dataset named "DogCentric Activity Dataset", and we show the effectiveness of the extracted features.
In this paper, we present a system to register housewares in a room to database automatically to maintain an informationally-structured environment. We assume that housewares requested by a user are likely to be appeared in an egocentric vision of the user. The proposed system captures the egocentric vision by a smart glass, detects multi-class objects in images using CNN, and registers it to the database. We demonstrate the developed system enables to register several objects in a room automatically.
This paper proposes a new concept of "fourth-person sensing" for service robots. The proposed concept combines wearable cameras (the first-person viewpoint), sensors mounted on robots (the second-person viewpoint) and sensors embedded in the informationally structured environment (the third-person viewpoint). Each sensor has its advantage and disadvantage, while the proposed concept can compensate the disadvantages by combining the advantages of all sensors. The proposed concept can be used to understand a user's intention and context of the scene with high accuracy, thus it enables to provide proactive services by service robots. As one of applications of the proposed concept, we developed a HCI system combines the first-person sensing and the third-person one. We show the effectiveness of the proposed concepts through experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.