The ventral visual stream underlies key human visual object recognition abilities. However, neural encoding in the higher areas of the ventral stream remains poorly understood. Here, we describe a modeling approach that yields a quantitatively accurate model of inferior temporal (IT) cortex, the highest ventral cortical area. Using high-throughput computational techniques, we discovered that, within a class of biologically plausible hierarchical neural network models, there is a strong correlation between a model's categorization performance and its ability to predict individual IT neural unit response data. To pursue this idea, we then identified a high-performing neural network that matches human performance on a range of recognition tasks. Critically, even though we did not constrain this model to match neural data, its top output layer turns out to be highly predictive of IT spiking responses to complex naturalistic images at both the single site and population levels. Moreover, the model's intermediate layers are highly predictive of neural responses in the V4 cortex, a midlevel visual area that provides the dominant cortical input to IT. These results show that performance optimization-applied in a biologically appropriate model classcan be used to build quantitative predictive models of neural processing.computational neuroscience | computer vision | array electrophysiology R etinal images of real-world objects vary drastically due to changes in object pose, size, position, lighting, nonrigid deformation, occlusion, and many other sources of noise and variation. Humans effortlessly recognize objects rapidly and accurately despite this enormous variation, an impressive computational feat (1). This ability is supported by a set of interconnected brain areas collectively called the ventral visual stream (2, 3), with homologous areas in nonhuman primates (4). The ventral stream is thought to function as a series of hierarchical processing stages (5-7) that encode image content (e.g., object identity and category) increasingly explicitly in successive cortical areas (1,8,9). For example, neurons in the lowest area, V1, are well described by Gabor-like edge detectors that extract rough object outlines (10), although the V1 population does not show robust tolerance to complex image transformations (9). Conversely, rapidly evoked population activity in top-level inferior temporal (IT) cortex can directly support realtime, invariant object categorization over a wide range of tasks (11,12). Midlevel ventral areas-such as V4, the dominant cortical input to IT-exhibit intermediate levels of object selectivity and variation tolerance (12-14).Significant progress has been made in understanding lower ventral areas such as V1, where conceptually compelling models have been discovered (10). These models are also quantitatively accurate and can predict response magnitudes of individual neuronal units to novel image stimuli. Higher ventral cortical areas, especially V4 and IT, have been much more difficult to understand. Al...
The internal representations of early deep artificial neural networks (ANNs) were found to be remarkably similar to the internal neural representations measured experimentally in the primate brain. Here we ask, as deep ANNs have continued to evolve, are they becoming more or less brain-like? ANNs that are most functionally similar to the brain will contain mechanisms that are most like those used by the brain. We therefore developed Brain-Score -a composite of multiple neural and behavioral benchmarks that score any ANN on how similar it is to the brain's mechanisms for core object recognition -and we deployed it to evaluate a wide range of state-of-the-art deep ANNs. Using this scoring system, we here report that: (1) DenseNet-169, CORnet-S and ResNet-101 are the most brain-like ANNs.(2) There remains considerable variability in neural and behavioral responses that is not predicted by any ANN, suggesting that no ANN model has yet captured all the relevant mechanisms.(3) Extending prior work, we found that gains in ANN ImageNet performance led to gains on Brain-Score. However, correlation weakened at ≥ 70% top-1 ImageNet performance, suggesting that additional guidance from neuroscience is needed to make further advances in capturing brain mechanisms. (4) We uncovered smaller (i.e. less complex) ANNs that are more brain-like than many of the best-performing ImageNet models, which suggests the opportunity to simplify ANNs to better understand the ventral stream. The scoring system used here is far from complete. However, we propose that evaluating and tracking model-benchmark correspondences through a Brain-Score that is regularly updated with new brain data is an exciting opportunity: experimental benchmarks can be used to guide machine network evolution, and machine networks are mechanistic hypotheses of the brain's network and thus drive next experiments. To facilitate both of these, we release Brain-Score.org: a platform that hosts the neural and behavioral benchmarks, where ANNs for visual processing can be submitted to receive a Brain-Score and their rank relative to other models, and where new experimental data can be naturally incorporated. computational neuroscience | object recognition | deep neural networks
The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.
To go beyond qualitative models of the biological substrate of object recognition, we ask: can a single ventral stream neuronal linking hypothesis quantitatively account for core object recognition performance over a broad range of tasks? We measured human performance in 64 object recognition tests using thousands of challenging images that explore shape similarity and identity preserving object variation. We then used multielectrode arrays to measure neuronal population responses to those same images in visual areas V4 and inferior temporal (IT) cortex of monkeys and simulated V1 population responses. We tested leading candidate linking hypotheses and control hypotheses, each postulating how ventral stream neuronal responses underlie object recognition behavior. Specifically, for each hypothesis, we computed the predicted performance on the 64 tests and compared it with the measured pattern of human performance. All tested hypotheses based on low-and mid-level visually evoked activity (pixels, V1, and V4) were very poor predictors of the human behavioral pattern. However, simple learned weighted sums of distributed average IT firing rates exactly predicted the behavioral pattern. More elaborate linking hypotheses relying on IT trial-by-trial correlational structure, finer IT temporal codes, or ones that strictly respect the known spatial substructures of IT ("face patches") did not improve predictive power. Although these results do not reject those more elaborate hypotheses, they suggest a simple, sufficient quantitative model: each object recognition task is learned from the spatially distributed mean firing rates (100 ms) of ϳ60,000 IT neurons and is executed as a simple weighted sum of those firing rates.
Extensive research has revealed that the ventral visual stream hierarchically builds a robust representation for supporting visual object categorization tasks. We systematically explored the ability of multiple ventral visual areas to support a variety of 'category-orthogonal' object properties such as position, size and pose. For complex naturalistic stimuli, we found that the inferior temporal (IT) population encodes all measured category-orthogonal object properties, including those properties often considered to be low-level features (for example, position), more explicitly than earlier ventral stream areas. We also found that the IT population better predicts human performance patterns across properties. A hierarchical neural network model based on simple computational principles generates these same cross-area patterns of information. Taken together, our empirical results support the hypothesis that all behaviorally relevant object properties are extracted in concert up the ventral visual hierarchy, and our computational model explains how that hierarchy might be built.
IMPORTANCE Artificial intelligence (AI) has been applied to analysis of medical imaging in recent years, but AI to guide the acquisition of ultrasonography images is a novel area of investigation. A novel deep-learning (DL) algorithm, trained on more than 5 million examples of the outcome of ultrasonographic probe movement on image quality, can provide real-time prescriptive guidance for novice operators to obtain limited diagnostic transthoracic echocardiographic images.OBJECTIVE To test whether novice users could obtain 10-view transthoracic echocardiographic studies of diagnostic quality using this DL-based software. DESIGN, SETTING, AND PARTICIPANTSThis prospective, multicenter diagnostic study was conducted in 2 academic hospitals. A cohort of 8 nurses who had not previously conducted echocardiograms was recruited and trained with AI. Each nurse scanned 30 patients aged at least 18 years who were scheduled to undergo a clinically indicated echocardiogram at Northwestern Memorial Hospital or Minneapolis Heart Institute between March and May 2019. These scans were compared with those of sonographers using the same echocardiographic hardware but without AI guidance.INTERVENTIONS Each patient underwent paired limited echocardiograms: one from a nurse without prior echocardiography experience using the DL algorithm and the other from a sonographer without the DL algorithm. Five level 3-trained echocardiographers independently and blindly evaluated each acquisition.MAIN OUTCOMES AND MEASURES Four primary end points were sequentially assessed: qualitative judgement about left ventricular size and function, right ventricular size, and the presence of a pericardial effusion. Secondary end points included 6 other clinical parameters and comparison of scans by nurses vs sonographers.RESULTS A total of 240 patients (mean [SD] age, 61 [16] years old; 139 men [57.9%]; 79 [32.9%] with body mass indexes >30) completed the study. Eight nurses each scanned 30 patients using the DL algorithm, producing studies judged to be of diagnostic quality for left ventricular size, function, and pericardial effusion in 237 of 240 cases (98.8%) and right ventricular size in 222 of 240 cases (92.5%). For the secondary end points, nurse and sonographer scans were not significantly different for most parameters.CONCLUSIONS AND RELEVANCE This DL algorithm allows novices without experience in ultrasonography to obtain diagnostic transthoracic echocardiographic studies for evaluation of left ventricular size and function, right ventricular size, and presence of a nontrivial pericardial effusion, expanding the reach of echocardiography to clinical settings in which immediate interrogation of anatomy and cardiac function is needed and settings with limited resources.
Erythrocytes (red blood cells) play an essential role in the respiratory functions of vertebrates, carrying oxygen from lungs to tissues and CO(2) from tissues to lungs. They are mechanically very soft, enabling circulation through small capillaries. The small thermally induced displacements of the membrane provide an important tool in the investigation of the mechanics of the cell membrane. However, despite numerous studies, uncertainties in the interpretation of the data, and in the values derived for the main parameters of cell mechanics, have rendered past conclusions from the fluctuation approach somewhat controversial. Here we revisit the experimental method and theoretical analysis of fluctuations, to adapt them to the case of cell contour fluctuations, which are readily observable experimentally. This enables direct measurements of membrane tension, of bending modulus, and of the viscosity of the cell cytoplasm. Of the various factors that influence the mechanical properties of the cell, we focus here on: 1), the level of oxygenation, as monitored by Raman spectrometry; 2), cell shape; and 3), the concentration of hemoglobin. The results show that, contrary to previous reports, there is no significant difference in cell tension and bending modulus between oxygenated and deoxygenated states, in line with the softness requirement for optimal circulatory flow in both states. On the other hand, tension and bending moduli of discocyte- and spherocyte-shaped cells differ markedly, in both the oxygenated and deoxygenated states. The tension in spherocytes is much higher, consistent with recent theoretical models that describe the transitions between red blood cell shapes as a function of membrane tension. Cell cytoplasmic viscosity is strongly influenced by the hydration state. The implications of these results to circulatory flow dynamics in physiological and pathological conditions are discussed.
Background: Echocardiographic quantification of left ventricular (LV) ejection fraction (EF) relies on either manual or automated identification of endocardial boundaries followed by model-based calculation of end-systolic and end-diastolic LV volumes. Recent developments in artificial intelligence resulted in computer algorithms that allow near automated detection of endocardial boundaries and measurement of LV volumes and function. However, boundary identification is still prone to errors limiting accuracy in certain patients. We hypothesized that a fully automated machine learning algorithm could circumvent border detection and instead would estimate the degree of ventricular contraction, similar to a human expert trained on tens of thousands of images. Methods: Machine learning algorithm was developed and trained to automatically estimate LVEF on a database of >50 000 echocardiographic studies, including multiple apical 2- and 4-chamber views (AutoEF, BayLabs). Testing was performed on an independent group of 99 patients, whose automated EF values were compared with reference values obtained by averaging measurements by 3 experts using conventional volume-based technique. Inter-technique agreement was assessed using linear regression and Bland-Altman analysis. Consistency was assessed by mean absolute deviation among automated estimates from different combinations of apical views. Finally, sensitivity and specificity of detecting of EF ≤35% were calculated. These metrics were compared side-by-side against the same reference standard to those obtained from conventional EF measurements by clinical readers. Results: Automated estimation of LVEF was feasible in all 99 patients. AutoEF values showed high consistency (mean absolute deviation =2.9%) and excellent agreement with the reference values: r =0.95, bias=1.0%, limits of agreement =±11.8%, with sensitivity 0.90 and specificity 0.92 for detection of EF ≤35%. This was similar to clinicians’ measurements: r =0.94, bias=1.4%, limits of agreement =±13.4%, sensitivity 0.93, specificity 0.87. Conclusions: Machine learning algorithm for volume-independent LVEF estimation is highly feasible and similar in accuracy to conventional volume-based measurements, when compared with reference values provided by an expert panel.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.