Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimization procedures, produce notably different results, making extremely difficult to sift methods that significantly outperform others. This situation motivates the current study, in which we perform a systematic evaluation and statistical analysis of vanilla deep regression, i.e. convolutional neural networks with a linear regression top layer. This is the first comprehensive analysis of deep regression techniques. We perform experiments on four vision problems, and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture. Our results reinforce the hypothesis according to which, in general, a general-purpose network (e.g. VGG-16 or ResNet-50) adequately tuned can yield results close to the state-of-the-art without having to resort to more complex and ad-hoc regression models.
We have developed a technique to study how good computers can be at diagnosing gastrointestinal lesions from regular (white light and narrow banded) colonoscopic videos compared to two levels of clinical knowledge (expert and beginner). Our technique includes a novel tissue classification approach which may save clinician's time by avoiding chromoendoscopy, a time-consuming staining procedure using indigo carmine. Our technique also discriminates the severity of individual lesions in patients with many polyps, so that the gastroenterologist can directly focus on those requiring polypectomy. Technically, we have designed and developed a framework combining machine learning and computer vision algorithms, which performs a virtual biopsy of hyperplastic lesions, serrated adenomas and adenomas. Serrated adenomas are very difficult to classify due to their mixed/hybrid nature and recent studies indicate that they can lead to colorectal cancer through the alternate serrated pathway. Our approach is the first step to avoid systematic biopsy for suspected hyperplastic tissues. We also propose a database of colonoscopic videos showing gastrointestinal lesions with ground truth collected from both expert image inspection and histology. We not only compare our system with the expert predictions, but we also study if the use of 3D shape features improves classification accuracy, and compare our technique's performance with three competitor methods.
This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.
Compelling evidence indicates the existence of bidirectional communication between astrocytes and neurons. Astrocytes, a type of glial cells classically considered to be passive supportive cells, have been recently demonstrated to be actively involved in the processing and regulation of synaptic information, suggesting that brain function arises from the activity of neuron-glia networks. However, the actual impact of astrocytes in neural network function is largely unknown and its application in artificial intelligence remains untested. We have investigated the consequences of including artificial astrocytes, which present the biologically defined properties involved in astrocyte-neuron communication, on artificial neural network performance. Using connectionist systems and evolutionary algorithms, we have compared the performance of artificial neural networks (NN) and artificial neuron-glia networks (NGN) to solve classification problems. We show that the degree of success of NGN is superior to NN. Analysis of performances of NN with different number of neurons or different architectures indicate that the effects of NGN cannot be accounted for an increased number of network elements, but rather they are specifically due to astrocytes. Furthermore, the relative efficacy of NGN vs. NN increases as the complexity of the network increases. These results indicate that artificial astrocytes improve neural network performance, and established the concept of Artificial Neuron-Glia Networks, which represents a novel concept in Artificial Intelligence with implications in computational science as well as in the understanding of brain function.
Convolutional Neural Networks (ConvNets) have become the state-of-the-art for many classification and regression problems in computer vision. When it comes to regression, approaches such as measuring the Euclidean distance of target and predictions are often employed as output layer. In this paper, we propose the coupling of a Gaussian mixture of linear inverse regressions with a Con-vNet, and we describe the methodological foundations and the associated algorithm to jointly train the deep network and the regression function. We test our model on the headpose estimation problem. In this particular problem, we show that inverse regression outperforms regression models currently used by state-of-the-art computer vision methods. Our method does not require the incorporation of additional data, as it is often proposed in the literature, thus it is able to work well on relatively small training datasets. Finally, it outperforms state-of-the-art methods in head-pose estimation using a widely used head-pose dataset. To the best of our knowledge, we are the first to incorporate inverse regression into deep learning for computer vision applications.
Deformable models are segmentation techniques that adapt a curve with the goal of maximizing its overlap with the actual contour of an object of interest within an image. Such a process requires that an optimization framework be defined whose most critical issues include: choosing an optimization method which exhibits robustness with respect to noisy and highly-multimodal search spaces; selecting the optimization and segmentation algorithms' parameters; choosing the representation for encoding prior knowledge on the image domain of interest; initializing the curve in a location which favors its convergence onto the boundary of the object of interest. All these problems are extensively discussed within this manuscript, with reference to the family of global stochastic optimization techniques that are generally termed metaheuristics, and are designed to solve complex optimization and machine learning problems. In particular, we present a complete study on the application of metaheuristics to image segmentation based on deformable models. This survey studies, analyzes and contextualizes the most notable and recent works on this topic, proposing an original categorization for these hybrid approaches. It aims to serve as a reference work which proposes some guidelines for choosing and designing the most appropriate combination of deformable models and metaheuristics when facing a given segmentation problem. After recalling the principles underlying deformable models and metaheuristics, we broadly review the different metaheuristic-based approaches to image segmentation based on deformable models, and conclude with a general discussion about methodological and design issues as well as future research and application trends.
image segmentation using geometric deformable models and metaheuristics. Computerized Medical Imaging and Graphics, Elsevier, 2015, 43, pp.167-178. 10.1016/j.compmedimag.2013 Abstract This paper describes a hybrid level set approach for medical image segmentation. This new geometric deformable model combines region-and edge-based information with the prior shape knowledge introduced using deformable registration. Our proposal consists of two phases: training and test. The former implies the learning of the level set parameters by means of a Genetic Algorithm, while the latter is the proper segmentation, where another metaheuristic, in this case Scatter Search, derives the shape prior. In an experimental comparison, this approach has shown a better performance than a number of state-of-the-art methods when segmenting anatomical structures from different biomedical image modalities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.