Many organisms and objects deform nonrigidly when moving, requiring perceivers to separate shape changes from object motions. Surprisingly, the abilities of observers to correctly infer nonrigid volumetric shapes from motion cues have not been measured, and structure from motion models predominantly use variants of rigidity assumptions. We show that observers are equally sensitive at discriminating cross-sections of flexing and rigid cylinders based on motion cues, when the cylinders are rotated simultaneously around the vertical and depth axes. A computational model based on motion perspective (i.e., assuming perceived depth is inversely proportional to local velocity) predicted the psychometric curves better than shape from motion factorization models using shape or trajectory basis functions. Asymmetric percepts of symmetric cylinders, arising because of asymmetric velocity profiles, provided additional evidence for the dominant role of relative velocity in shape perception. Finally, we show that inexperienced observers are generally incapable of using motion cues to detect inflation/deflation of rigid and flexing cylinders, but this handicap can be overcome with practice for both nonrigid and rigid shapes. The empirical and computational results of this study argue against the use of rigidity assumptions in extracting 3D shape from motion and for the primacy of motion deformations computed from motion shears.optic-flow | structure-from-motion
The study of cross-modal influences in perception, particularly between the auditory and visual modalities, has been intensified recently. This paper reports on a comprehensive study of auditory-visual cross-modal influences in motion, including motion aftereffects (MAE). We examined both auditory influences on visual perception and vice versa. Visual motion interactions were examined using three directional pairings or configurations: along the horizontal, vertical, and depth axes. In Experiment 1 we assessed how the simultaneous presence of a strong motion signal in one modality affected the perception of motion in the other modality. To investigate further whether such influences have long-term effects, we tested whether adaptation in one modality alone could produce cross-modal MAEs in Experiment 2. Overall, the pattern of results was similar across all directional pairings, with the strongest cross-modal influences observed in motion along the horizontal axis; this is likely due to the greater co-localization of the two stimuli in this configuration. Although both auditory and visual stimuli affected the other modality when presented simultaneously, significant cross-modally induced aftereffects could only be produced using visual stimuli. However, we did observe vertical visual MAE following adaptation to auditory spectral motion. These results are discussed in terms of current psychophysical and neurophysiological findings concerning the way in which auditory-visual signals are processed.
We studied how stimulus attributes (angle polarity and perspective) and data-driven signals (motion parallax and binocular disparity) affect recovery of 3-D shape. We used physical stimuli, which consisted of two congruent trapezoids forming a dihedral angle. To study the effects of the stimulus attributes, we used 2 × 2 combinations of convex/concave angles and proper/reverse perspective cues. To study the effects of binocular disparity and motion parallax, we used 2 × 2 combinations of monocular/binocular viewing with moving/stationary observers. The task was to report the depth of the right vertical edge relative to a fixation point positioned at a different depth. In Experiment 1 observers also had the option of reporting that the right vertical edge and fixation point were at the same depth. However, in Experiment 2, observers were only given two response options: is the right vertical edge in front of/behind the fixation point? We found that across all stimulus configurations, perspective is a stronger cue than angle polarity in recovering 3-D shape; we also confirm the bias to perceive convex compared to concave angles. In terms of data-driven signals, binocular disparity recovered 3-D shape better than motion parallax. Interestingly, motion parallax improved performance for monocular viewing but not for binocular viewing.
The visual system can learn to use information in new ways to construct appearance. Thus, signals such as the location or translation direction of an ambiguously rotating wire frame cube, which are normally uninformative, can be learned as cues to determine the rotation direction [1]. This perceptual learning occurs when the formerly uninformative signal is statistically associated with long-trusted visual cues (such as binocular disparity) that disambiguate appearance during training. In previous demonstrations, the newly learned cue was intrinsic to the perceived object, in that the signal was conveyed by the same image elements as the object itself. Here we used extrinsic new signals and observed no learning. We correlated three new signals with long-trusted cues in the rotating cube paradigm: one crossmodal (an auditory signal) and two within modality (visual). Cue recruitment did not occur in any of these conditions, either in single sessions or in ten sessions across as many days. These results suggest that the intrinsic/extrinsic distinction is important for the perceptual system in determining whether it can learn and use new information from the environment to construct appearance. Extrinsic cues do have perceptual effects (e.g. the “bounce-pass” illusion [2] and McGurk effect [3]), so we speculate that extrinsic signals must be recruited for perception, but only if certain conditions are met. These conditions might specify the age of the observer, the strength of the long-trusted cues, or the amount of exposure to the correlation.
The apparent direction of rotation of perceptually bistable wire-frame (Necker) cubes can be conditioned to depend on retinal location by interleaving their presentation with cubes that are disambiguated by depth cues (Haijiang, Saunders, Stone & Backus, 2006; Harrison & Backus, 2010a). The long-term nature of the learned bias is demonstrated by resistance to counter-conditioning on a consecutive day. In previous work, either binocular disparity and occlusion, or a combination of monocular depth cues that included occlusion, internal occlusion, haze, and depth-from-shading, were used to control the rotation direction of disambiguated cubes. Here, we test the relative effectiveness of these two sets of depth cues in establishing the retinal location bias. Both cue sets were highly effective in establishing a perceptual bias on Day 1 as measured by the perceived rotation direction of ambiguous cubes. The effect of counter-conditioning on Day 2, on perceptual outcome for ambiguous cubes, was independent of whether the cue set was the same or different as Day 1. This invariance suggests that a common neural population instantiates the bias for rotation direction, regardless of the cue-set used. However, in a further experiment where only disambiguated cubes were presented on Day 1, perceptual outcome of ambiguous cubes during Day 2 counter-conditioning showed that the monocular-only cue set was in fact more effective than disparity-plus-occlusion for causing long-term learning of the bias. These results can be reconciled if the conditioning effect of Day 1 ambiguous trials in the first experiment is taken into account (Harrison & Backus, 2010b). We suggest that monocular disambiguation leads to stronger bias either because it more strongly activates a single neural population that is necessary for perceiving rotation, or because ambiguous stimuli engage cortical areas that are also engaged by monocularly disambiguated stimuli but not by disparity-disambiguated stimuli.
Most moving objects in the world are non-rigid, changing shape as they move. To disentangle shape changes from movements, computational models either fit shapes to combinations of basis shapes or motion trajectories to combinations of oscillations but are biologically unfeasible in their input requirements. Recent neural models parse shapes into stored examples, which are unlikely to exist for general shapes. We propose that extracting shape attributes, e.g., symmetry, facilitates veridical perception of non-rigid motion. In a new method, identical dots were moved in and out along invisible spokes, to simulate the rotation of dynamically and randomly distorting shapes. Discrimination of rotation direction measured as a function of non-rigidity was 90% as efficient as the optimal Bayesian rotation decoder and ruled out models based on combining the strongest local motions. Remarkably, for non-rigid symmetric shapes, observers outperformed the Bayesian model when perceived rotation could correspond only to rotation of global symmetry, i.e., when tracking of shape contours or local features was uninformative. That extracted symmetry can drive perceived motion suggests that shape attributes may provide links across the dorsal–ventral separation between motion and shape processing. Consequently, the perception of non-rigid object motion could be based on representations that highlight global shape attributes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.