Feature point detection based on convolutional neural network (CNN) has been studied widely. The effective approaches for improving detection accuracy are building a deeper network or using a multi-network cascade structure. However, some potential capacity of CNN has not been excavated. In this study, the authors mainly analyse several factors influencing CNN performance from two aspects: (i) the position relationships between feature points and (ii) the normalisation methods of coordinates. Whether the network can learn the position relationships is also studied. For extracting the deep features of images, a network containing three convolution layers is constructed. The specific geometric relationship constraints are applied during calibration to maximise the capability of the CNN for learning the position relationship between feature points. Considering that different feature points only appear in various local regions of an image, local normalisation is proposed, which increases the mapping scope of the feature points and decreases the mapping error. The experimental results prove that the specific position relationship and local normalisation obviously improve the feature point detection based on CNN. At the detection error of 5%, the average detection accuracy of eyelid feature points is improved by 7.1% and single-point detection receives a high accuracy of 97.96%.
Eye movement information is the key clue for recognizing the vision-dominated tasks, such as browsing the web, or watching a video. However, traditional wearable sensors are invasive and the vision-based eye trackers are very expensive and need time consuming calibration. Therefore, an activity recognition method based on eye movement analysis under one web camera is first proposed and the feasibility is assessed. First, an iris tracking method for the low quality image is proposed to acquire eye movement information. Then, five ten novel features are extracted from the horizontal and the vertical eye movement signals for activity recognition, and the optimal feature subset is selected. Finally, the support vector machine is used to assess the feasibility of the proposed method. Three experiments are designed for different applications: leave-one-out cross-validation, k-fold cross-validation, and validation after respective calibration. Experimental results show that their accuracies are 68.4%, 79.3% and 84.1%, respectively, which demonstrate the promise of eye based activity recognition using one web camera.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.