Deep learning approaches have achieved breakthrough performance in various domains. However, the segmentation of raw eye-movement data into discrete events is still done predominantly either by hand or by algorithms that use handpicked parameters and thresholds. We propose and make publicly available a small 1D-CNN in conjunction with a bidirectional long short-term memory network that classifies gaze samples as fixations, saccades, smooth pursuit, or noise, simultaneously assigning labels in windows of up to 1 s. In addition to unprocessed gaze coordinates, our approach uses different combinations of the speed of gaze, its direction, and acceleration, all computed at different temporal scales, as input features. Its performance was evaluated on a large-scale hand-labeled ground truth data set (GazeCom) and against 12 reference algorithms. Furthermore, we introduced a novel pipeline and metric for event detection in eye-tracking recordings, which enforce stricter criteria on the algorithmically produced events in order to consider them as potentially correct detections. Results show that our deep approach outperforms all others, including the state-of-the-art multi-observer smooth pursuit detector. We additionally test our best model on an independent set of recordings, where our approach stays highly competitive compared to literature methods.
Our results suggest a restricted centrally focused visual exploration behavior in patients not only on pictures, but also on movies of real-life scenes. While ETD observed in the laboratory cannot be directly transferred to natural viewing conditions, these alterations support a model of impairments in motion information processing in patients resulting in a reduced ability to perceive moving objects and less saliency driven exploration behavior presumably contributing to alterations in the perception of the natural environment.
Research on eye movements has primarily been performed in two distinct ways: (1) under highly controlled conditions using simple stimuli such as dots on a uniform background, or (2) under free-viewing conditions with complex images, real-world movies, or even with observers moving around in the world. Although both approaches offer important insights, the generalizability among eye movement behaviors observed under these different conditions is unclear. Here, we compared eye movement responses to video clips showing moving objects within their natural context with responses to simple Gaussian blobs on a blank screen. Importantly, for both conditions, the targets moved along the same trajectories at the same speed. We measured standard oculometric measures for both stimulus complexities, as well as the effect of the relative angle between saccades and pursuit, and compared them across conditions. In general, eye movement responses were qualitatively similar, especially with respect to pursuit gain. For both types of stimuli, the accuracy of saccades and subsequent pursuit was highest when both eye movements were collinear. We also found interesting differences; for example, latencies of initial saccades to moving Gaussian blob targets were significantly faster compared to saccades to moving objects in video scenes, whereas pursuit accuracy was significantly higher in video scenes. These findings suggest a lower processing demand for simple target conditions during saccade preparation and an advantage for tracking behavior in natural scenes due to higher predictability provided by the context information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.