Ilker Yildirim scite author profile

Vision not only detects and recognizes objects, but performs rich inferences about the underlying scene structure that causes the patterns of light we see. Inverting generative models, or “analysis-by-synthesis”, presents a possible solution, but its mechanistic implementations have typically been too slow for online perception, and their mapping to neural circuits remains unclear. Here we present a neurally plausible efficient inverse graphics model and test it in the domain of face recognition. The model is based on a deep neural network that learns to invert a three-dimensional face graphics program in a single fast feedforward pass. It explains human behavior qualitatively and quantitatively, including the classic “hollow face” illusion, and it maps directly onto a specialized face-processing circuit in the primate brain. The model fits both behavioral and neural data better than state-of-the-art computer vision models, and suggests an interpretable reverse-engineering account of how the brain transforms images into percepts.

show abstract

Talker-specificity and adaptation in quantifier interpretation

Yildirim

Degen

Tanenhaus

et al. 2016

Journal of Memory and Language

View full text Add to dashboard Cite

Linguistic meaning has long been recognized to be highly context-dependent. Quantifiers like many and some provide a particularly clear example of context-dependence. For example, the interpretation of quantifiers requires listeners to determine the relevant domain and scale. We focus on another type of context-dependence that quantifiers share with other lexical items: talker variability. Different talkers might use quantifiers with different interpretations in mind. We used a web-based crowdsourcing paradigm to study participants’ expectations about the use of many and some based on recent exposure. We first established that the mapping of some and many onto quantities (candies in a bowl) is variable both within and between participants. We then examined whether and how listeners’ expectations about quantifier use adapts with exposure to talkers who use quantifiers in different ways. The results demonstrate that listeners can adapt to talker-specific biases in both how often and with what intended meaning many and some are used.

show abstract

Modeling human intuitions about liquid flow with particle-based simulation

et al. 2019

View full text Add to dashboard Cite

Humans can easily describe, imagine, and, crucially, predict a wide variety of behaviors of liquids—splashing, squirting, gushing, sloshing, soaking, dripping, draining, trickling, pooling, and pouring—despite tremendous variability in their material and dynamical properties. Here we propose and test a computational model of how people perceive and predict these liquid dynamics, based on coarse approximate simulations of fluids as collections of interacting particles. Our model is analogous to a “game engine in the head”, drawing on techniques for interactive simulations (as in video games) that optimize for efficiency and natural appearance rather than physical accuracy. In two behavioral experiments, we found that the model accurately captured people’s predictions about how liquids flow among complex solid obstacles, and was significantly better than several alternatives based on simple heuristics and deep neural networks. Our model was also able to explain how people’s predictions varied as a function of the liquids’ properties (e.g., viscosity and stickiness). Together, the model and empirical results extend the recent proposal that human physical scene understanding for the dynamics of rigid, solid objects can be supported by approximate probabilistic simulation, to the more complex and unexplored domain of fluid dynamics.

show abstract

Transfer of object category knowledge across visual and haptic modalities: Experimental and computational studies

Yildirim

Jacobs

2013

Cognition

View full text Add to dashboard Cite

From Sensory Signals to Modality-Independent Conceptual Representations: A Probabilistic Language of Thought Approach

2015

View full text Add to dashboard Cite

People learn modality-independent, conceptual representations from modality-specific sensory signals. Here, we hypothesize that any system that accomplishes this feat will include three components: a representational language for characterizing modality-independent representations, a set of sensory-specific forward models for mapping from modality-independent representations to sensory signals, and an inference algorithm for inverting forward models—that is, an algorithm for using sensory signals to infer modality-independent representations. To evaluate this hypothesis, we instantiate it in the form of a computational model that learns object shape representations from visual and/or haptic signals. The model uses a probabilistic grammar to characterize modality-independent representations of object shape, uses a computer graphics toolkit and a human hand simulator to map from object representations to visual and haptic features, respectively, and uses a Bayesian inference algorithm to infer modality-independent object representations from visual and/or haptic signals. Simulation results show that the model infers identical object representations when an object is viewed, grasped, or both. That is, the model’s percepts are modality invariant. We also report the results of an experiment in which different subjects rated the similarity of pairs of objects in different sensory conditions, and show that the model provides a very accurate account of subjects’ ratings. Conceptually, this research significantly contributes to our understanding of modality invariance, an important type of perceptual constancy, by demonstrating how modality-independent representations can be acquired and used. Methodologically, it provides an important contribution to cognitive modeling, particularly an emerging probabilistic language-of-thought approach, by showing how symbolic and statistical approaches can be combined in order to understand aspects of human perception.

show abstract

Learning multisensory representations for auditory-visual transfer of sequence category knowledge: a probabilistic language of thought approach

Yildirim

Jacobs

2014

Psychon Bull Rev

View full text Add to dashboard Cite

We thank S. Piantadosi for many helpful discussions, and the editor and anonymous reviewers for constructive comments on this manuscript. This work was supported by research grants from the National Science Foundation (DRL-0817250, BCS-1400784) and the Air Force Office of Scientific Research (FA9550-12-1-0303). AbstractIf a person is trained to recognize or categorize objects or events using one sensory modality, the person can often recognize or categorize those same (or similar) objects and events via a novel modality. This phenomenon is an instance of cross-modal transfer of knowledge. Here, we study the Multisensory Hypothesis which states that people extract the intrinsic, modality-independent properties of objects and events, and represent these properties in multisensory representations. These representations underlie cross-modal transfer of knowledge. We conducted an experiment evaluating whether people transfer sequence category knowledge across auditory and visual domains. Our experimental data clearly indicate that we do. We also developed a computational model accounting for our experimental results. Consistent with the probabilistic language of thought approach to cognitive modeling, our model formalizes multisensory representations as symbolic "computer programs" and uses Bayesian inference to learn these representations. Because the model demonstrates how the acquisition and use of amodal, multisensory representations can underlie cross-modal transfer of knowledge, and because the model accounts for subjects' experimental performances, our work lends credence to the Multisensory Hypothesis. Overall, our work suggests that people automatically extract and represent objects' and events' intrinsic properties, and use these properties to process and understand the same (and similar) objects and events when they are perceived through novel sensory modalities.2

show abstract

Efficient inverse graphics in biological face processing

Yildirim

Freiwald

Tenenbaum

2018

Preprint

View full text Add to dashboard Cite

Vision must not only recognize and localize objects, but perform richer inferences about the underlying causes in the world that give rise to sensory data. How the brain performs these inferences remains unknown: Theoretical proposals based on inverting generative models (or "analysis-by-synthesis") have a long history but their mechanistic implementations have typically been too slow to support online perception, and their mapping to neural circuits is unclear. Here we present a neurally plausible model for e ciently inverting generative models of images and test it as an account of one high-level visual capacity, the perception of faces. The model is based on a deep neural network that learns to invert a threedimensional (3D) face graphics program in a single fast feedforward pass. It explains both human behavioral data and multiple levels of neural processing in non-human primates, as well as a classic illusion, the "hollow face" e↵ect. The model fits qualitatively better than state-of-the-art computer vision models, and suggests an interpretable reverse-engineering account of how images are transformed into percepts in the ventral stream.

show abstract

A Rational Analysis of the Acquisition of Multisensory Representations

Yildirim

Jacobs

2011

Cognitive Science

View full text Add to dashboard Cite

How do people learn multisensory, or amodal, representations, and what consequences do these representations have for perceptual performance? We address this question by performing a rational analysis of the problem of learning multisensory representations. This analysis makes use of a Bayesian nonparametric model that acquires latent multisensory features that optimally explain the unisensory features arising in individual sensory modalities. The model qualitatively accounts for several important aspects of multisensory perception: (a) it integrates information from multiple sensory sources in such a way that it leads to superior performances in, for example, categorization tasks; (b) its performances suggest that multisensory training leads to better learning than unisensory training, even when testing is conducted in unisensory conditions; (c) its multisensory representations are modality invariant; and (d) it predicts ''missing'' sensory representations in modalities when the input to those modalities is absent. Our rational analysis indicates that all of these aspects emerge as part of the optimal solution to the problem of learning to represent complex multisensory environments.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ilker Yildirim

Efficient inverse graphics in biological face processing

Talker-specificity and adaptation in quantifier interpretation

Modeling human intuitions about liquid flow with particle-based simulation

Transfer of object category knowledge across visual and haptic modalities: Experimental and computational studies

From Sensory Signals to Modality-Independent Conceptual Representations: A Probabilistic Language of Thought Approach

Learning multisensory representations for auditory-visual transfer of sequence category knowledge: a probabilistic language of thought approach

Efficient inverse graphics in biological face processing

A Rational Analysis of the Acquisition of Multisensory Representations

Contact Info

Product

Resources

About