Wearable computing technologies are advancing rapidly and enabling users to easily record daily activities for applications such as life-logging or health monitoring. Recognizing hand and object interactions in these videos will help broaden application domains, but recognizing such interactions automatically remains a difficult task. Activity recognition from the first-person point-of-view is difficult because the video includes constant motion, cluttered backgrounds, and sudden changes of scenery. Recognizing hand-related activities is particularly challenging due to the many temporal and spatial variations induced by hand interactions. We present a novel approach to recognize hand-object interactions by extracting both local motion features representing the subtle movements of the hands and global hand shape features to capture grasp types. We validate our approach on multiple egocentric action datasets and show that state-of-the-art performance can be achieved by considering both local motion and global appearance information.Index Terms-Wearable cameras, first-person point-ofview, activity recognition