RITnet: Real-time Semantic Segmentation of the Eye for Gaze Tracking

Chaudhary, Aayush K.; Kothari, Rakshit; Acharya, Manoj; Dangi, Shusil; Nair, Nitinraj; Bailey, Reynold; Kanan, Christopher; Díaz, Gabriel; Pelz, Jeff B.

doi:10.1109/iccvw.2019.00568

Cited by 59 publications

(38 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Very recently some authors such as [23][24][25][26][27] have been published improvements in order to make more feasible the semantic segmentation to be used in mobile devices. Those papers also used EDS-Dataset from a Facebook competition creating a new test set reaching very competitive results.…”

Section: A Segmentation Networkmentioning

confidence: 99%

Towards an Efficient Segmentation Algorithm for Near-Infrared Eyes Images

Valenzuela

Arellano²,

Tapia

2020

IEEE Access

View full text Add to dashboard Cite

Semantic segmentation has been widely used for several applications, including the detection of eye structures. This is used in tasks such as eye-tracking and gaze estimation, which are useful techniques for human-computer interfaces, salience detection, and Virtual reality (VR), amongst others. Most of the state of the art techniques achieve high accuracy but with a considerable number of parameters. This paper explores alternatives to improve the efficiency of the state of the art method, namely DenseNet Tiramisu, when applied to NIR image segmentation. This task is not trivial; the reduction of block and layers also affects the number of feature maps. The growth rate (k) of the feature maps regulates how much new information each layer contributes to the global state, therefore the trade-off amongst grown rate (k), IOU, and the number of layers needs to be carefully studied. The main goal is to achieve a lightweight and efficient network with fewer parameters than traditional architectures in order to be used for mobile device applications. As a result, a DenseNet with only three blocks and ten layers is proposed (DenseNet10). Experiments show that this network achieved higher IOU rates when comparing with Encoder-Decoder, DensetNet56-67-103, MaskRCNN, and DeeplabV3+ models in the Facebook database. Furthermore, this method reached 8th place in The Facebook semantic segmentation challenge with 0.94293 mean IOU and 202.084 parameters with a final score of 0.97147. This score is only 0,001 lower than the first place in the competition. The sclera was identified as the more challenging structure to be segmented.

show abstract

Section: A Segmentation Networkmentioning

confidence: 99%

Towards an Efficient Segmentation Algorithm for Near-Infrared Eyes Images

Valenzuela

Arellano²,

Tapia

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Stated differently, the metric M is the combination of the mean intersection over union (mIOU) and model size in megabytes S. Generally, our approach achieves a competitive result, with less than half of the number of trainable parameters compared to the best result on the OpenEDS dataset as shown in Table 2. In terms of speed, our system took only 16.56 seconds while RITnet [29] took 22.75 seconds to iterate over a set of 1, 440 test images on an NVIDIA 1080Ti GPU. A comparison between our predictions and those of RITnet [29] is shown in Figure 8.…”

Section: B Evaluationmentioning

confidence: 99%

“…Perry and Fernandez [26] leveraged dilated and asymmetric convolution, while Kansal and Devanathan [27] utilized squeeze-andexcitation [16] block as well as spatial attention on channel attetion [28]. Chaudhary et al [29] presented an architecture based on DenseNet [30] and UNet [15]. They performed a lot of augmentation operations during training, such as Gaussian blur, image translation, and corruption.…”

Section: Introductionmentioning

confidence: 99%

Semantic Segmentation of the Eye With a Lightweight Deep Network and Shape Correction

et al. 2020

View full text Add to dashboard Cite

This paper presents a method to address the multi-class eye segmentation problem which is an essential step for gaze tracking or applying a biometric system in the virtual reality environment. Our system can run on the resource-constrained environments, such as mobile, embedded devices for real-time inference, while still ensuring the accuracy. To achieve those ends, we deployed the system with three major stages: obtain a grayscale image from the input, divide the image into three distinct eye regions with a deep network, and refine the results with image processing techniques. The deep network is built upon an encoder-decoder scheme with depthwise separation convolution for the low-resource systems. Image processing is accomplished based on the geometric properties of the eye to remove incorrect regions as well as to correct the shape of the eye. The experiments were conducted using OpenEDS, a large dataset of eye images captured with a head-mounted display with two synchronized eye-facing cameras. We achieved a mean intersection over union (mIoU) of 94.91% with a model of size 0.4 megabytes and 16.56 seconds to iterate over the test set of 1,440 images.

show abstract

“…Feature and model-based eye tracking systems have demonstrated to be simpler and more accurate approaches and have become the consensus solution [ 1 , 13 ]. Works applying machine learning techniques for semantic segmentation [ 14 , 15 , 16 ] or pupil center detection [ 17 , 18 ] in these controlled environments can be found. The use of convolutional neural networks (CNN) has proven to be a robust solution for pupil center detection methods in challenging images with artifacts due to poor illumination, reflections or pupil occlusion [ 17 , 18 ].…”

Section: Introductionmentioning

confidence: 99%

Accurate Pupil Center Detection in Off-the-Shelf Eye Tracking Systems Using Convolutional Neural Networks

Larumbe-Bergera

Garde

Porta

et al. 2021

Sensors

View full text Add to dashboard Cite

Remote eye tracking technology has suffered an increasing growth in recent years due to its applicability in many research areas. In this paper, a video-oculography method based on convolutional neural networks (CNNs) for pupil center detection over webcam images is proposed. As the first contribution of this work and in order to train the model, a pupil center manual labeling procedure of a facial landmark dataset has been performed. The model has been tested over both real and synthetic databases and outperforms state-of-the-art methods, achieving pupil center estimation errors below the size of a constricted pupil in more than 95% of the images, while reducing computing time by a 8 factor. Results show the importance of use high quality training data and well-known architectures to achieve an outstanding performance.

show abstract

RITnet: Real-time Semantic Segmentation of the Eye for Gaze Tracking

Cited by 59 publications

References 16 publications

Towards an Efficient Segmentation Algorithm for Near-Infrared Eyes Images

Towards an Efficient Segmentation Algorithm for Near-Infrared Eyes Images

Semantic Segmentation of the Eye With a Lightweight Deep Network and Shape Correction

Accurate Pupil Center Detection in Off-the-Shelf Eye Tracking Systems Using Convolutional Neural Networks

Contact Info

Product

Resources

About