In the field of pervasive computing, wearable devices have been widely used for recognizing human activities. One important area in this research is the recognition of activities of daily living where especially inertial sensors and interaction sensors (like RFID tags with scanners) are popular choices as data sources. Using interaction sensors, however, has one drawback: they may not differentiate between proper interaction and simple touching of an object. A positive signal from an interaction sensor is not necessarily caused by a performed activity e.g., when an object is only touched but no interaction occurred afterwards. There are, however, many scenarios like medicine intake that rely heavily on correctly recognized activities. In our work, we aim to address this limitation and present a multimodal egocentric-based activity recognition approach. Our solution relies on object detection that recognizes activity-critical objects in a frame. As it is infeasible to always expect a high quality camera view, we enrich the vision features with inertial sensor data that monitors the users’ arm movement. This way we try to overcome the drawbacks of each respective sensor. We present our results of combining inertial and video features to recognize human activities on different types of scenarios where we achieve an F1-measure of up to 79.6%.
Wearable devices have been used widely for human activity recognition in the field of pervasive computing. One big area of in this research is the recognition of activities of daily living where especially inertial and interaction sensors like RFID tags and scanners have been used. An issue that may arise when using interaction sensors is a lack of certainty. A positive signal from an interaction sensor is not necessarily caused by a performed activity e.g. when an object is only touched but no interaction occurred afterwards. In our work, we aim to overcome this limitation and present a multi-modal egocentricbased activity recognition approach which is able to recognize the critical activities by looking at movement and object information at the same time. We present our results of combining inertial and video features to recognize human activities on different types of scenarios where we achieve a F1-measure up to 79.6%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.