Preethi Vaidyanathan scite author profile

Prud’hommeaux

Pelz

et al. 2018

Humans rely on multiple sensory modalities when examining and reasoning over images. In this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task. The task was performed by multiple participants on 100 general-domain images showing everyday objects and activities. We demonstrate the usefulness of the dataset by applying an existing visual-linguistic data fusion framework in order to label important image regions with appropriate linguistic labels.

Recurrence quantification analysis reveals eye-movement behavior differences between experts and novices

Pelz

Alm

et al. 2014

Understanding and characterizing perceptual expertise is a major bottleneck in developing intelligent systems. In knowledge-rich domains such as dermatology, perceptual expertise influences the diagnostic inferences made based on the visual input. This study uses eye movement data from 12 dermatology experts and 12 undergraduate novices while they inspected 34 dermatological images. This work investigates the differences in global and local temporal fixation patterns between the two groups using recurrence quantification analysis (RQA). The RQA measures reveal significant differences in both global and local temporal patterns between the two groups. Results show that experts tended to refixate previously inspected areas less often than did novices, and their refixations were more widely separated in time. Experts were also less likely to follow extended scan paths repeatedly than were novices. These results suggest the potential value of RQA measures in characterizing perceptual expertise. We also discuss potential use of the RQA method in understanding the interactions between experts' visual and linguistic behavior.

Fusing eye movements and observer narratives for expert-driven image-region annotations

Vaidyanathan¹,

Prud’hommeaux

Pelz³

et al. 2016

Human image understanding is reflected by individuals' visual and linguistic behaviors, but the meaningful computational integration and interpretation of their multimodal representations remain a challenge. In this paper, we expand a framework for capturing image-region annotations in dermatology, a domain in which interpreting an image is influenced by experts' visual perception skills, conceptual domain knowledge, and task-oriented goals. Our work explores the hypothesis that eye movements can help us understand experts' perceptual processes and that spoken language descriptions can reveal conceptual elements of image inspection tasks. We cast the problem of meaningfully integrating visual and linguistic data as unsupervised bitext alignment. Using alignment, we create meaningful mappings between physicians' eye movements, which reveal key areas of images, and spoken descriptions of those images. The resulting alignments are then used to annotate image regions with medical concept labels. Our alignment accuracy exceeds baselines using both exact and delayed temporal correspondence. Additionally, comparison of alignment accuracy between a method that identifies clusters in the images based on eye movement vs. a method that identifies clusters using image features suggests that the two approaches perform well on different types of images and concept labels. This suggests that an image annotation framework should integrate information from more than one technique to handle heterogeneous images. We also investigate the performance of the proposed aligner for dermatological primary morphology concept labels, as well as for lesion size or type and distribution-based categories of images.

Computational Integration of Human Vision and Natural Language through Bitext Alignment

Prud’hommeaux

Alm

et al. 2015

Multimodal integration of visual and linguistic data is a longstanding but crucial challenge for modeling human understanding. We propose a framework that uses an unsupervised bitext alignment method to integrate visual and linguistic data. We present an empirical study of the various parameters of the framework. Our results exceed baselines using both exact and delayed temporal correspondence. The resulting alignments can be used for image classification and retrieval.

Using human experts' gaze data to evaluate image processing algorithms

Pelz

et al. 2011

Fusing Dialogue and Gaze From Discussions of 2D and 3D Scenes

Wang

Olson

et al. 2019

Human-centric approaches to image understanding and retrieval

Mulpuru

et al. 2010

The amount of digital medical image data is increasing rapidly in terms of both quantity and heterogeneity. There exists a great need to format medical image archives so as to facilitate diagnostics and preventive medicine. To achieve this, in the past few decades great efforts have been made to investigate methods of applying content-based image retrieval (CBIR) techniques to retrieve images. However, several critical challenges remain. Recently, CBIR research has become intertwined with the fundamental problem of image understanding and it is recognized that computing solutions that bridge the "semantic gap" must capture higher-level domain knowledge of medical end users. We are investigating the incorporation of state-of-the-art visual categorization techniques into conventional CBIR approaches. Visual attention deployment strategies of medical experts serve as an objective measure to help us understand the perceptual and conceptual processes involved in identifying key visual features and selecting diagnostic regions of the images. Understanding these processes will inform and direct feature selection approaches on medical images, such as the dermatological images used in our study. We also explore systematic and effective information integration methods of image data and semantic descriptions with the long-term goals of building efficient human-centered multi-modal interactive CBIR systems.

Design, development and feasibility testing of an mhealth application for sleep-restricted adolescents

et al. 2017