Highly efficient Nd:YVO4 diode-laser end-pumped laser

We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures, recurrent networks and late fusion of features from multi-column CNN with different receptive fields. We propose switching convolutional neural network that leverages variation of crowd density within an image to improve the accuracy and localization of the predicted crowd count. Patches from a grid within a crowd scene are relayed to independent CNN regressors based on crowd count prediction quality of the CNN established during training. The independent CNN regressors are designed to have different receptive fields and a switch classifier is trained to relay the crowd scene patch to the best CNN regressor. We perform extensive experiments on all major crowd counting datasets and evidence better performance compared to current stateof-the-art methods. We provide interpretable representations of the multichotomy of space of crowd scene patches inferred from the switch. It is observed that the switch relays an image patch to a particular CNN column based on density of crowd.

show abstract

DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs

Prabhakar

2017

View full text Add to dashboard Cite

We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust to varying input conditions and capable of handling extreme exposure without artifacts. Deep representations have known to be robust to input conditions and have shown phenomenal performance in a supervised setting. However, the stumbling block in using deep learning for MEF was the lack of sufficient training data and an oracle to provide the ground-truth for supervision. To address the above issues, we have gathered a large dataset of multi-exposure image stacks for training and to circumvent the need for ground truth images, we propose an unsupervised deep learning framework for MEF utilizing a no-reference quality metric as loss function. The proposed approach uses a novel CNN architecture trained to learn the fusion operation without reference ground truth image. The model fuses a set of common low level features extracted from each image to generate artifact-free perceptually pleasing results. We perform extensive quantitative and qualitative evaluation and show that the proposed technique outperforms existing state-ofthe-art approaches for a variety of natural images.1 Exposure bias value indicates the amount of exposure offset from the auto exposure setting of an camera. For example, EV 1 is equal to doubling auto exposure time (EV 0).

show abstract

DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations

Kruthiventi

Kumar

Babu

2017

IEEE Trans. on Image Process.

402

242

View full text Add to dashboard Cite

Abstract-Understanding and predicting the human visual attentional mechanism is an active area of research in the fields of neuroscience and computer vision. In this work, we propose DeepFix, a first-of-its-kind fully convolutional neural network for accurate saliency prediction. Unlike classical works which characterize the saliency map using various hand-crafted features, our model automatically learns features in a hierarchical fashion and predicts saliency map in an end-to-end manner. DeepFix is designed to capture semantics at multiple scales while taking global context into account using network layers with very large receptive fields. Generally, fully convolutional nets are spatially invariant which prevents them from modeling location dependent patterns (e.g. centre-bias). Our network overcomes this limitation by incorporating a novel Location Biased Convolutional layer. We evaluate our model on two challenging eye fixation datasets -MIT300, CAT2000 and show that it outperforms other recent approaches by a significant margin.

show abstract

Data-free Parameter Pruning for Deep Neural Networks

2015

View full text Add to dashboard Cite

Deep Neural nets (NNs) with millions of parameters are at the heart of many state-of-the-art computer vision systems today. However, recent works have shown that much smaller models can achieve similar levels of performance. In this work, we address the problem of pruning parameters in a trained NN model. Instead of removing individual weights one at a time as done in previous works, we remove one neuron at a time. We show how similar neurons are redundant, and propose a systematic way to remove them. Unlike previous works, our pruning method does not require access to any training/validation data. Wiring similar neuronsThe main principle that we use in this paper is the fact that similar neurons are redundant, as shown in Figure 1. That is, if we find such a similar weight pair anywhere in a neural network, one of them can effectively be removed. Of course, while doing this we also need to account for the weights in the next layer, as shown in Figure 1. This observation also resonates with the well-known Hebbian principle, which roughly states that neurons that fire together (W 1 = W 2 ), wire together (a 1 = a 1 + a 2 ).Wiring dis-similar neurons The above principle cannot be used as is in real NNs, for one simple reason -weight-sets are seldom equal in value. What do we do when W 1 − W 2 = ε 1,2 ≥ 0 ? Let z n be the output neuron when there are n hidden neurons. Let us consider two similar weight sets W i and W j in z n and that we have chosen to remove W j to give us z n−1 . Using some approximate analysis, we derive a simple rule to find which weight-sets to remove. The final equation isWe aim to minimize the expected value of the squared difference between the output neurons. Using the expected error instead of the empirical error is what makes it a data-free parameter pruning method. We define the saliency of two weight-sets in (i, j) as s i, j = a 2 j ε i, j 2 2 , which is exactly the term inside the min(·) in Equation 1. Intuitively, saliency between two weight-sets is low when they have very similar values. Equation 1 tells us that we need to start removing lowest-saliency neuron to minimize the expected squared difference.We elucidate our procedure for neuron removal here:1. Compute the saliency s i, j for all possible values of (i, j). It can be stored as a square matrix M, with dimension equal to the number of neurons in the layer being considered. Pick the minimum entry in the matrix. Let it's indicies be (i , j ).Delete the j th neuron, and update a i ← a i + a j .3. Update M by removing the j th column and row, and updating the i th column (to account for the updated a i .)Connections to other methods Our method relates to the popular weightpruning method called Optimal Brain Damage (OBD) [3]. In fact, our method is equivalent to OBD if change in output activation produces proportional change in test error. Unfortunately, this is almost never the case for neural networks. Our method also weakly relates to Knowledge Distillation (KD) [1]. The idea in KD was to minimize the empirical difference in ...

show abstract

DeLiGAN: Generative Adversarial Networks for Diverse and Limited Data

2017

View full text Add to dashboard Cite

A class of recent approaches for generating images, called Generative Adversarial Networks (GAN), have been used to generate impressively realistic images of objects, bedrooms, handwritten digits and a variety of other image modalities. However, typical GAN-based approaches require large amounts of training data to capture the diversity across the image modality. In this paper, we propose DeLiGAN -a novel GAN-based architecture for diverse and limited training data scenarios. In our approach, we reparameterize the latent generative space as a mixture model and learn the mixture model's parameters along with those of GAN. This seemingly simple modification to the GAN framework is surprisingly effective and results in models which enable diversity in generated samples although trained with limited data. In our work, we show that DeLi-GAN can generate images of handwritten digits, objects and hand-drawn sketches, all using limited amounts of data. To quantitatively characterize intra-class diversity of generated samples, we also introduce a modified version of "inception-score", a measure which has been found to correlate well with human assessment of generated samples.

show abstract

Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN

et al. 2018

View full text Add to dashboard Cite

Universal Source-Free Domain Adaptation

et al. 2020

View full text Add to dashboard Cite

A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

et al. 2016

View full text Add to dashboard Cite

Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e., deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision -convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

R. Venkatesh Babu

Switching Convolutional Neural Network for Crowd Counting

DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs

DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations

Data-free Parameter Pruning for Deep Neural Networks

DeLiGAN: Generative Adversarial Networks for Diverse and Limited Data

Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN

Universal Source-Free Domain Adaptation

A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

Contact Info

Product

Resources

About