The invariant properties of human cortical neurons cannot be studied directly by fMRI due to its limited spatial resolution. Here, we circumvented this limitation by using fMR adaptation, namely, reduction of the fMR signal due to repeated presentation of identical images. Object-selective regions (lateral occipital complex [LOC]) showed a monotonic signal decrease as repetition frequency increased. The invariant properties of fMR adaptation were studied by presenting the same object in different viewing conditions. LOC exhibited stronger fMR adaptation to changes in size and position (more invariance) compared to illumination and viewpoint. The effect revealed two putative subdivisions within LOC: caudal-dorsal (LO), which exhibited substantial recovery from adaptation under all transformations, and posterior fusiform (PF/LOa), which displayed stronger adaptation. This study demonstrates the utility of fMR adaptation for revealing functional characteristics of neurons in fMRI studies.
The visual recognition of three-dimensional (3-D) objects on the basis of their shape poses at least two difficult problems. First, there is the problem of variable illumination, which can be addressed by working with relatively stable features such as intensity edges rather than the raw intensity images. Second, there is the problem of the initially unknown pose of the object relative to the viewer. In one approach to this problem, a hypothesis is first made about the viewpoint, then the appearance of a model object from such a viewpoint is computed and compared with the actual image. Such recognition schemes generally employ 3-D models of objects, but the automatic learning of 3-D models is itself a difficult problem. To address this problem in computational vision, we have developed a scheme, based on the theory of approximation of multivariate functions, that learns from a small set of perspective views a function mapping any viewpoint to a standard view. A network equivalent to this scheme will thus 'recognize' the object on which it was trained from any viewpoint.
Does the human brain represent objects for recognition by storing a series of two-dimensional snapshots, or are the object models, in some sense, three-dimensional analogs of the objects they represent? One way to address this question is to explore the ability of the human visual system to generalize recognition from familiar to unfamiliar views of threedimensional objects. Three recently proposed theories of object recognition-viewpoint normalization or alignment of threedimensional models [Ullman, S. (1989) [Poggio, T. & Edehnan, S. (1990) Nature (London) 343, 263-2661-predict different patterns of generalization to unfamiliar views. We have exploited the conflicting predictions to test the three theories directly in a psychophysical experiment involving computer-generated three-dimensional objects. Our results-suggest that the human visual system is better described as recognizing these objects by two-dimensional view interpolation than by alignment or other methods that rely on object-centered three-dimensional models.How does the human visual system represent objects for recognition? The experiments we describe address this question by testing the ability of human subjects (and of computer models instantiating particular theories of recognition) to generalize from familiar to unfamiliar views of visually novel objects. Because different theories predict different patterns of generalization according to the experimental conditions, this approach yields concrete evidence in favor of some ofthe theories and contradicts others. Theories That Rely on Three-Dimensional Object-Centered Representations The first class of theories we have considered (1-3) represents objects by three dimensional (3D) models, encoded in a viewpoint-independent fashion. One such approach, recognition by alignment (1), compares the input image with the projection of a stored model after the two are brought into register. The transformation necessary to achieve this registration is computed by matching a small number of features in the image with the corresponding features in the model. The aligning transformation is computed separately for each of the models stored in the system. Recognition is declared for the model that fits the input most closely after the two are aligned, if the residual dissimilarity between them is small enough. The decision criterion for recognition in this case can be stated in the following simplified form: 11PTx(3D) -x(2D)I < , [1] where T is the aligning transformation, P is a 3D twodimensional (2D) projection operator, and the norm IIu11 measures the dissimilarity between the projection of the transformed 3D model X(3D) and the input image X(2D). Recognition decision is then made based on a comparison between the measured dissimilarity and a threshold 6.One may make a further distinction between full alignment that uses 3D models and attempts to compensate for 3D transformations of objects (such as rotation in depth), and the alignment of pictorial descriptions that uses multiple views rather t...
In many different spatial discrimination tasks, such as in determining the sign of the offset in a vernier stimulus, the human visual system exhibits hyperacuitylevel performance by evaluating spatial relations with the precision of a fraction of a photoreceptor's diameter. We propose that this impressive performance depends in part on a fast learning process that uses relatively few examples (.D and occurs at an early processing stage in the visual pathway. We show that ----this hypothesis is plausible by demonstrating that it is possible to synthesize, from a small number of examples of a given task, a simple (HyperBF) network that attains the required performance level. We then verify with psychophysical experiments some of the key predictions of our conjecture. In particular, we .-1show that fast stimulus-specific learning indeed takes place in the human visual MOW system and that this learning does not transfer between two slightly different 0 hyperacuity tasks.) Massachusetts Institute of Technology, 1991 This paper describes research done at
Functional magnetic resonance imaging was used in combined functional selectivity and retinotopic mapping tests to reveal object-related visual areas in the human occipital lobe. Subjects were tested with right, left, up, or down hemivisual field stimuli which were composed of images of natural objects (faces, animals, man-made objects) or highly scrambled (1,024 elements) versions of the same images. In a similar fashion, the horizontal and vertical meridians were mapped to define the borders of these areas. Concurrently, the same cortical sites were tested for their sensitivity to image-scrambling by varying the number of scrambled picture fragments (from 16-1,024) while controlling for the Fourier power spectrum of the pictures and their order of presentation. Our results reveal a stagewise decrease in retinotopy and an increase in sensitivity to image-scrambling. Three main distinct foci were found in the human visual object recognition pathway (Ungerleider and Haxby [1994]: Curr Opin Neurobiol 4:157-165): 1) Retinotopic primary areas V1-3 did not exhibit significant reduction in activation to scrambled images. 2) Areas V4v (Sereno et al., [1995]: Science 268:889-893) and V3A (De Yoe et al., [1996]: Proc Natl Acad Sci USA 93:2382-2386; Tootell et al., [1997]: J Neurosci 71:7060-7078) manifested both retinotopy and decreased activation to highly scrambled images. 3) The essentially nonretinotopic lateral occipital complex (LO) (Malach et al., [1995]: Proc Natl Acad Sci USA 92:8135-8139; Tootell et al., [1996]: Trends Neurosci 19:481-489) exhibited the highest sensitivity to image scrambling, and appears to be homologous to macaque the infero-temporal (IT) cortex (Tanaka [1996]: Curr Opin Neurobiol 523-529). Breaking the images into 64, 256, or 1,024 randomly scrambled blocks reduced activation in LO voxels. However, many LO voxels remained significantly activated by mildly scrambled images (16 blocks). These results suggest the existence of object-fragment representation in LO.
Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well-known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise – let alone understand – the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near-future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, “Analysing vocal sequences in animals”. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial-style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality.
The extent to which primary visual cues such as motion or luminance are segregated in different cortical areas is a subject of controversy. To address this issue, we examined cortical activation in the human occipital lobe using functional magnetic resonance imaging (fMRI) while subjects performed a fixed visual task, object recognition, using three different primary visual cues: motion, texture, or luminance contrast. In the first experiment, a region located on the lateral aspect of the occipital lobe (LO complex) was preferentially activated in all 11 subjects both by luminance and motion-defined object silhouettes compared to full-field moving and stationary noise (ratios, 2.00+/-0.19 and 1.86+/-0.65, respectively). In the second experiment, all subjects showed enhanced activation in the LO complex to objects defined both by luminance and texture contrast compared to full-field texture patterns (ratios, 1.43+/-0.08 and 1.32+/-0.08, respectively). An additional smaller dorsal focus that exhibited convergence of object-related cues appeared to correspond to area V3a or a region slightly anterior to it. These results show convergence of visual cues in LO and provide strong evidence for its role in object processing.
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The ADIOS (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.computational linguistics ͉ grammar induction ͉ language acquisition ͉ machine learning ͉ protein classification M any types of sequential symbolic data possess structure that is (i) hierarchical and (ii) context-sensitive. Natural-language text and transcribed speech are prime examples of such data: a corpus of language consists of sentences defined over a finite lexicon of symbols such as words. Linguists traditionally analyze the sentences into recursively structured phrasal constituents (1); at the same time, a distributional analysis of partially aligned sentential contexts (2) reveals in the lexicon clusters that are said to correspond to various syntactic categories (such as nouns or verbs). Such structure, however, is not limited to the natural languages; recurring motifs are found, on a level of description that is common to all life on earth, in the base sequences of DNA that constitute the genome. We introduce an unsupervised algorithm that discovers hierarchical structure in any sequence data, on the basis of the minimal assumption that the corpus at hand contains partially overlapping strings at multiple levels of organization. In the linguistic domain, our algorithm has been successfully tested both on artificialgrammar output and on natural-language corpora such as ATIS (3), CHILDES (4), and the Bible (5). In bioinformatics, the algorithm has been shown to extract from protein sequences syntactic structures that are highly correlated with the functional properties of these proteins. The ADIOS Algorithm for Grammar-Like Rule InductionIn a machine learning paradigm for grammar induction, a teacher produces a sequence of strings generated by a grammar G 0 , and a learner uses the resulting corpus to construct a grammar G, aiming to approximate G 0 in some sense (6). Recent evidence suggests that natural language acquisition involves both statistical computation (e.g., in speech segmentation) and rule-like algebraic processes (e.g., in structured generalization) (7-11). Modern computatio...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.