Visual motion can be represented in terms of the dynamic visual features in the retinal image or in terms of the moving surfaces in the environment that give rise to these features. For natural images, the two types of representation are necessarily quite different because many moving features are only spuriously related to the motion of surfaces in the visual scene. Such "extrinsic" features arise at occlusion boundaries and may be detected by virtue of the depth-ordering cues that exist at those boundaries. Although a number of studies have provided evidence of the impact of depth ordering on the perception of visual motion, few attempts have been made to identify the neuronal substrate of this interaction. To address this issue, we devised a simple contextual manipulation that decouples surface motion from the motions of visual image features. By altering the depth ordering between a moving pattern and abutting static regions, the perceived direction of motion changes dramatically while image motion remains constant. When stimulated with these displays, many neurons in the primate middle temporal visual area (area MT) represent the implied surface motion rather than the motion of retinal image features. These neurons thus use contextual depth-ordering information to achieve a representation of the visual scene consistent with perceptual experience.
Key words: motion perception; psychophysics; neurophysiology; binocular disparity; extrastriate; monkeyThe locally measured motion of a one-dimensional visual image feature, such as an edge, is ambiguous (Wohlgemuth, 1911;Wallach, 1935;Marr and Ullman, 1981). This is known as the "aperture problem." This ambiguity can, in principle, be overcome by measuring the unambiguous motion of a two-dimensional visual image feature, such as where two edges of a surface meet to form a corner. Many two-dimensional visual image features, however, occur where edges from two different but overlapping surfaces meet. Such compound features are "intrinsic" to neither surface and have been termed "extrinsic" . Shimojo et al. (1989) demonstrated that human observers differentiate intrinsic and extrinsic features on the basis of depth-ordering cues that exist at occlusion boundaries. Furthermore, these investigators discovered that intrinsic features are used to overcome the aperture problem, whereas extrinsic features have relatively little influence. By allowing classification of image features as either intrinsic or extrinsic to a moving surface, depth-ordering cues thus provide a context for the correct interpretation of ambiguous motion information.To explore this contextual motion-depth interaction, we developed a variation of the classic barber-pole illusion (Wallach, 1935). Our "barber-diamond" stimuli (see Fig. 1) consist of a moving grating framed by a static, diamond-shaped aperture. Two of the four textured panels that define the aperture are placed in front of the grating via stereoscopic depth cues, and the other two are placed behind. These depth manipulations simulate pa...