In this paper we present a novel framework for simultaneous detection of click action and estimation of occluded fingertip positions from egocentric viewed single-depth image sequences. For the detection and estimation, a novel probabilistic inference based on knowledge priors of clicking motion and clicked position is presented. Based on the detection and estimation results, we were able to achieve a fine resolution level of a bare hand-based interaction with virtual objects in egocentric viewpoint. Our contributions include: (i) a rotation and translation invariant finger clicking action and position estimation using the combination of 2D image-based fingertip detection with 3D hand posture estimation in egocentric viewpoint. (ii) a novel spatio-temporal random forest, which performs the detection and estimation efficiently in a single framework. We also present (iii) a selection process utilizing the proposed clicking action detection and position estimation in an arm reachable AR/VR space, which does not require any additional device. Experimental results show that the proposed method delivers promising performance under frequent self-occlusions in the process of selecting objects in AR/VR space whilst wearing an egocentric-depth camera-attached HMD.
In this paper, we present a novel single shot face-related task analysis method, called Face-SSD, for detecting faces and for performing various face-related (classification / regression) tasks including smile recognition, face attribute prediction and valence-arousal estimation in the wild. Face-SSD uses a Fully Convolutional Neural Network (FCNN) to detect multiple faces of different sizes and recognise / regress one or more face-related classes. Face-SSD has two parallel branches that share the same low-level filters, one branch dealing with face detection and the other one with face analysis tasks. The outputs of both branches are spatially aligned heatmaps that are produced in paralleltherefore Face-SSD does not require that face detection, facial region extraction, size normalisation, and facial region processing are performed in subsequent steps. Our contributions are threefold: 1) Face-SSD is the first network to perform face analysis without relying on pre-processing such as face detection and registration in advance -Face-SSD is a simple and a single FCNN architecture simultaneously performing face detection and face-related task analysis -those are conventionally treated as separate consecutive tasks; 2) Face-SSD is a generalised architecture that is applicable for various face analysis tasks without modifying the network structure -this is in contrast to designing task-specific architectures; and 3) Face-SSD achieves real-time performance (21 FPS) even when detecting multiple faces and recognising multiple classes in a given image (300 × 300). Experimental results show that Face-SSD achieves state-of-the-art performance in various face analysis tasks by reaching a recognition accuracy of 95.76% for smile detection, 90.29% for attribute prediction, and Root Mean Square (RMS) error of 0.44 and 0.39 for valence and arousal estimation. (Youngkyoon Jang) recent studies design specific architectures for each individual face analysis task. Although some works propose unified frameworks for handling multiple face-related tasks [56,3,35], several open issues remain yet to be explored:• Unconstrained conditions: Most of the existing approaches require a detected and normalised face input.• Scalability: Most methods design separate networks for different tasks. However, networks that are specifically designed to maximise the performance for certain tasks cannot be easily adapted to do other types of face analysis tasks.• Real-time performance: Existing methods do not achieve real-time performance because they require time-
We present a novel smiling face detection framework called SmileNet for detecting faces and recognising smiles in the wild. SmileNet uses a Fully Convolutional Neural Network (FCNN) to detect multiple smiling faces in a given image of varying resolution. Our contributions are threefold: 1) SmileNet is the first smiling face detection network that does not require pre-processing such as face detection and registration in advance to generate a normalised (cropped and aligned) input image; 2) the proposed SmileNet is a simple and single FCNN architecture simultaneously performing face detection and smile recognition, which are conventionally treated as separate consecutive pipelines; and 3) SmileNet ensures real-time processing speed (21.15 FPS) even when detecting multiple smiling faces in a given image (300 × 300). Experimental results show that SmileNet can deliver state-of-the-art performance (95.76%), even under occlusions, and variances of pose, scale, and illumination.
This paper presents an outdoor video dataset annotated with action labels, collected from 24 participants wearing two head-mounted cameras (GoPro and SMI eye tracker) while assembling a camping tent. In total, this is 5.4 hours of recordings. Tent assembly includes manual interactions with non-rigid objects such as spreading the tent, securing guylines, reading instructions, and opening a tent bag. An interesting aspect of the dataset is that it reflects participants' proficiency in completing or understanding the task. This leads to participant differences in action sequences and action durations. Our dataset, called EPIC-Tent 1 , also has several new types of annotations for two synchronised egocentric videos. These include task errors, self-rated uncertainty and gaze position, in addition to the task action labels. We present baseline results on the EPIC-Tent dataset using a state-of-the-art method for offline and online action recognition and detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.