Matthew H. Tong scite author profile

We propose a definition of saliency by considering what the visual system is trying to optimize when directing attention. The resulting model is a Bayesian framework from which bottom-up saliency emerges naturally as the self-information of visual features, and overall saliency (incorporating top-down information with bottom-up saliency) emerges as the pointwise mutual information between the features and the target when searching for a target. An implementation of our framework demonstrates that our model's bottom-up saliency maps perform as well as or better than existing algorithms in predicting people's fixations in free viewing. Unlike existing saliency measures, which depend on the statistics of the particular image being viewed, our measure of saliency is derived from natural image statistics, obtained in advance from a collection of natural images. For this reason, we call our model SUN (Saliency Using Natural statistics). A measure of saliency based on natural image statistics, rather than based on a single test image, provides a straightforward explanation for many search asymmetries observed in humans; the statistics of a single test image lead to predictions that are not consistent with these asymmetries. In our model, saliency is computed locally, which is consistent with the neuroanatomy of the early visual system and results in an efficient algorithm with few free parameters.

show abstract

SUN: Top-down saliency using natural statistics

Kanan

et al. 2009

View full text Add to dashboard Cite

When people try to find particular objects in natural scenes they make extensive use of knowledge about how and where objects tend to appear in a scene. Although many forms of such "top-down" knowledge have been incorporated into saliency map models of visual search, surprisingly, the role of object appearance has been infrequently investigated. Here we present an appearance-based saliency model derived in a Bayesian framework. We compare our approach with both bottom-up saliency algorithms as well as the state-of-the-art Contextual Guidance model of Torralba et al. (2006) at predicting human fixations. Although both top-down approaches use very different types of information, they achieve similar performance; each substantially better than the purely bottomup models. Our experiments reveal that a simple model of object appearance can predict human fixations quite well, even making the same mistakes as people.

show abstract

Learning grammatical structure with Echo State Networks

et al. 2007

View full text Add to dashboard Cite

Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development

et al. 2021

View full text Add to dashboard Cite

We developed a rich dataset of Chest X-Ray (CXR) images to assist investigators in artificial intelligence. The data were collected using an eye-tracking system while a radiologist reviewed and reported on 1,083 CXR images. The dataset contains the following aligned data: CXR image, transcribed radiology report text, radiologist’s dictation audio and eye gaze coordinates data. We hope this dataset can contribute to various areas of research particularly towards explainable and multimodal deep learning/machine learning methods. Furthermore, investigators in disease classification and localization, automated radiology report generation, and human-machine interaction can benefit from these data. We report deep learning experiments that utilize the attention maps produced by the eye gaze dataset to show the potential utility of this dataset.

show abstract

Memory and visual search in naturalistic 2D and 3D environments

Aivar

Kit

et al. 2016

Journal of Vision

View full text Add to dashboard Cite

The role of memory in guiding attention allocation in daily behaviors is not well understood. In experiments with two-dimensional (2D) images, there is mixed evidence about the importance of memory. Because the stimulus context in laboratory experiments and daily behaviors differs extensively, we investigated the role of memory in visual search, in both two-dimensional (2D) and three-dimensional (3D) environments. A 3D immersive virtual apartment composed of two rooms was created, and a parallel 2D visual search experiment composed of snapshots from the 3D environment was developed. Eye movements were tracked in both experiments. Repeated searches for geometric objects were performed to assess the role of spatial memory. Subsequently, subjects searched for realistic context objects to test for incidental learning. Our results show that subjects learned the room-target associations in 3D but less so in 2D. Gaze was increasingly restricted to relevant regions of the room with experience in both settings. Search for local contextual objects, however, was not facilitated by early experience. Incidental fixations to context objects do not necessarily benefit search performance. Together, these results demonstrate that memory for global aspects of the environment guides search by restricting allocation of attention to likely regions, whereas task relevance determines what is learned from the active search experience. Behaviors in 2D and 3D environments are comparable, although there is greater use of memory in 3D.

show abstract

Control of gaze while walking: Task structure, reward, and uncertainty

2017

View full text Add to dashboard Cite

While it is universally acknowledged that both bottom up and top down factors contribute to allocation of gaze, we currently have limited understanding of how top-down factors determine gaze choices in the context of ongoing natural behavior. One purely top-down model by Sprague, Ballard, and Robinson (2007) suggests that natural behaviors can be understood in terms of simple component behaviors, or modules, that are executed according to their reward value, with gaze targets chosen in order to reduce uncertainty about the particular world state needed to execute those behaviors. We explore the plausibility of the central claims of this approach in the context of a task where subjects walk through a virtual environment performing interceptions, avoidance, and path following. Many aspects of both walking direction choices and gaze allocation are consistent with this approach. Subjects use gaze to reduce uncertainty for task-relevant information that is used to inform action choices. Notably the addition of motion to peripheral objects did not affect fixations when the objects were irrelevant to the task, suggesting that stimulus saliency was not a major factor in gaze allocation. The modular approach of independent component behaviors is consistent with the main aspects of performance, but there were a number of deviations suggesting that modules interact. Thus the model forms a useful, but incomplete, starting point for understanding top-down factors in active behavior.

show abstract

Why is the fusiform face area recruited for novel categories of expertise? A neurocomputational investigation

2008

View full text Add to dashboard Cite

What is the role of the Fusiform Face Area (FFA)? Is it specific to face processing, or is it a visual expertise area? The expertise hypothesis is appealing due to a number of studies 15showing that the FFA is activated by pictures of objects within the subject's domain of 16 expertise (e.g., cars for car experts, birds for birders, etc.), and that activation of the FFA 17 increases as new expertise is acquired in the lab. However, it is incumbent upon the 18 proponents of the expertise hypothesis to explain how it is that an area that is initially 19 specialized for faces becomes recruited for new classes of stimuli. We dub this the "visual 20 expertise mystery." One suggested answer to this mystery is that the FFA is used simply 21 because it is a fine discrimination area, but this account has historically lacked a mechanism 22 describing exactly how the FFA would be recruited for novel domains of expertise. (De Renzi et al., 1994).

show abstract

Extracting and evaluating general world knowledge from the Brown corpus

Schubert

Tong

2003

View full text Add to dashboard Cite

We have been developing techniques for extracting general world knowledge from miscellaneous texts by a process of approximate interpretation and abstraction, focusing initially on the Brown corpus. We apply interpretive rules to clausal patterns and patterns of modification, and concurrently abstract general "possibilistic" propositions from the resulting formulas. Two examples are "A person may believe a proposition", and "Children may live with relatives". Our methods currently yield over 117,000 such propositions (of variable quality) for the Brown corpus (more than 2 per sentence). We report here on our efforts to evaluate these results with a judging scheme aimed at determining how many of these propositions pass muster as "reasonable general claims" about the world in the opinion of human judges. We find that nearly 60% of the extracted propositions are favorably judged according to our scheme by any given judge. The percentage unanimously judged to be reasonable claims by multiple judges is lower, but still sufficiently high to suggest that our techniques may be of some use in tackling the long-standing "knowledge acquisition bottleneck" in AI.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Matthew H. Tong

SUN: A Bayesian framework for saliency using natural statistics

SUN: Top-down saliency using natural statistics

Learning grammatical structure with Echo State Networks

Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development

Memory and visual search in naturalistic 2D and 3D environments

Control of gaze while walking: Task structure, reward, and uncertainty

Why is the fusiform face area recruited for novel categories of expertise? A neurocomputational investigation

Extracting and evaluating general world knowledge from the Brown corpus

Contact Info

Product

Resources

About