The initial image-processing stages of visual cortex are well suited to a local (patchwise) analysis of the viewed scene. But the world's structures extend over space as textures and surfaces, suggesting the need for spatial integration. Most models of contrast vision fall shy of this process because (i) the weak area summation at detection threshold is attributed to probability summation (PS) and (ii) there is little or no advantage of area well above threshold. Both of these views are challenged here. First, it is shown that results at threshold are consistent with linear summation of contrast following retinal inhomogeneity, spatial filtering, nonlinear contrast transduction and multiple sources of additive Gaussian noise. We suggest that the suprathreshold loss of the area advantage in previous studies is due to a concomitant increase in suppression from the pedestal. To overcome this confound, a novel stimulus class is designed where: (i) the observer operates on a constant retinal area, (ii) the target area is controlled within this summation field, and (iii) the pedestal is fixed in size. Using this arrangement, substantial summation is found along the entire masking function, including the region of facilitation. Our analysis shows that PS and uncertainty cannot account for the results, and that suprathreshold summation of contrast extends over at least seven target cycles of grating.
Abstract-Visual mechanisms in primary visual cortex are suppressed by the superposition of gratings perpendicular to their preferred orientations. A clear picture of this process is needed to (i) inform functional architecture of image-processing models, (ii) identify the pathways available to support binocular rivalry, and (iii) generally advance our understanding of early vision. Here we use monoptic sine-wave gratings and cross-orientation masking (XOM) to reveal two crossoriented suppressive pathways in humans, both of which occur before full binocular summation of signals. One is a within-eye (ipsiocular) pathway that is spatially broadband, immune to contrast adaptation and has a suppressive weight that tends to decrease with stimulus duration. The other pathway operates between the eyes (interocular), is spatially tuned, desensitizes with contrast adaptation and has a suppressive weight that increases with stimulus duration. When cross-oriented masks are presented to both eyes, masking is enhanced or diminished for conditions in which either ipsiocular or interocular pathways dominate masking, respectively. We propose that ipsiocular suppression precedes the influence of interocular suppression and tentatively associate the two effects with the lateral geniculate nucleus (or retina) and the visual cortex respectively. The interocular route is a good candidate for the initial pathway involved in binocular rivalry and predicts that interocular cross-orientation suppression should be found in cortical cells with predominantly ipsiocular drive.
Contrast sensitivity improves with the area of a sine-wave grating, but why? Here we assess this phenomenon against contemporary models involving spatial summation, probability summation, uncertainty, and stochastic noise. Using a two-interval forced-choice procedure we measured contrast sensitivity for circular patches of sine-wave gratings with various diameters that were blocked or interleaved across trials to produce low and high extrinsic uncertainty, respectively. Summation curves were steep initially, becoming shallower thereafter. For the smaller stimuli, sensitivity was slightly worse for the interleaved design than for the blocked design. Neither area nor blocking affected the slope of the psychometric function. We derived model predictions for noisy mechanisms and extrinsic uncertainty that was either low or high. The contrast transducer was either linear (c(1.0)) or nonlinear (c(2.0)), and pooling was either linear or a MAX operation. There was either no intrinsic uncertainty, or it was fixed or proportional to stimulus size. Of these 10 canonical models, only the nonlinear transducer with linear pooling (the noisy energy model) described the main forms of the data for both experimental designs. We also show how a cross-correlator can be modified to fit our results and provide a contemporary presentation of the relation between summation and the slope of the psychometric function.
We assessed summation of contrast across eyes and area at detection threshold (C(t)). Stimuli were sine-wave gratings (2.5 c/deg) spatially modulated by cosine- and anticosine-phase raised plaids (0.5 c/deg components oriented at +/-45 degrees ). When presented dichoptically the signal regions were interdigitated across eyes but produced a smooth continuous grating following their linear binocular sum. The average summation ratio (C(t1)/([C(t1+2)]) for this stimulus pair was 1.64 (4.3 dB). This was only slightly less than the binocular summation found for the same patch type presented to both eyes, and the area summation found for the two different patch types presented to the same eye. We considered 192 model architectures containing each of the following four elements in all possible orders: (i) linear summation or a MAX operator across eyes, (ii) linear summation or a MAX operator across area, (iii) linear or accelerating contrast transduction, and (iv) additive Gaussian, stochastic noise. Formal equivalences reduced this to 62 different models. The most successful four-element model was: linear summation across eyes followed by nonlinear contrast transduction, linear summation across area, and late noise. Model performance was enhanced when additional nonlinearities were placed before binocular summation and after area summation. The implications for models of probability summation and uncertainty are discussed.
In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (ΔF0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (ΔF0 = ± 8, ± 2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ΔF0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping "primitive" based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.
Speech comprises dynamic and heterogeneous acoustic elements, yet it is heard as a single perceptual stream even when accompanied by other sounds. The relative contributions of grouping "primitives" and of speech-specific grouping factors to the perceptual coherence of speech are unclear, and the acoustical correlates of the latter remain unspecified. The parametric manipulations possible with simplified speech signals, such as sine-wave analogues, make them attractive stimuli to explore these issues. Given that the factors governing perceptual organization are generally revealed only where competition operates, the second-formant competitor (F2C) paradigm was used, in which the listener must resist competition to optimize recognition [Remez, R. E., et al. (1994). Psychol. Rev. 101, 129-156]. Three-formant (F1+F2+F3) sine-wave analogues were derived from natural sentences and presented dichotically (one ear=F1+F2C+F3; opposite ear=F2). Different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, regardless of their amplitude characteristics. In contrast, F2Cs with constant frequency contours were completely ineffective. Competitor efficacy was not due to energetic masking of F3 by F2C. These findings indicate that modulation of the frequency, but not the amplitude, contour is critical for across-formant grouping.
In psychophysics, cross-orientation suppression (XOS) and cross-orientation facilitation (XOF) have been measured by investigating mask configuration on the detection threshold of a centrally placed patch of sine-wave grating. Much of the evidence for XOS and XOF comes from studies using low and high spatial frequencies, respectively, where the interactions are thought to arise from within (XOS) and outside (XOF) the footprint of the classical receptive field. We address the relation between these processes here by measuring the effects of various sizes of superimposed and annular cross-oriented masks on detection thresholds at two spatial scales (1 and 7 c/deg) and on contrast increment thresholds at 7 c/deg. A functional model of our results indicates the following (1) XOS and XOF both occur for superimposed and annular masks. (2) XOS declines with spatial frequency but XOF does not. (3) The spatial extent of the interactions does not scale with spatial frequency, meaning that surround-effects are seen primarily at high spatial frequencies. (4) There are two distinct processes involved in XOS: direct divisive suppression and modulation of self-suppression. (5) Whether XOS or XOF wins out depends upon their relative weights and mask contrast. These results prompt enquiry into the effect of spatial frequency at the single-cell level and place new constraints on image-processing models of early visual processing.
How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1+F2+F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1+F2C+F3C; F2+ F3), where F2C+F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formantfrequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0= constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.