Xiangyu Zhang scite author profile

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [29]). To our knowledge, our result is the first to surpass human-level performance (5.1%, [22]) on this visual recognition challenge.

show abstract

Identity Mappings in Deep Residual Networks

Zhang

Ren

et al. 2016

6,599

5,354

View full text Add to dashboard Cite

Deep residual networks [1] have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https://github.com/KaimingHe/ resnet-1k-layers.

show abstract

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Zhang

Ren

et al. 2015

IEEE Trans. Pattern Anal. Mach. Intell.

8,469

4,022

View full text Add to dashboard Cite

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 × faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

show abstract

Channel Pruning for Accelerating Very Deep Neural Networks

2017

View full text Add to dashboard Cite

In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5× speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2× speedup respectively, which is significant. Code has been made publicly available 1 .

show abstract

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

et al. 2014

View full text Add to dashboard Cite

Abstract-Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224×224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-theart classification results using a single full-image representation and no fine-tuning.The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102× faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007.In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

show abstract

Single Path One-Shot Neural Architecture Search with Uniform Sampling

Guo¹,

Zhang²,

Mu³

et al. 2020

439

513

View full text Add to dashboard Cite

Basolateral to Central Amygdala Neural Circuits for Appetitive Behaviors

Kim

Zhang

Muralidhar

et al. 2017

Neuron

347

455

View full text Add to dashboard Cite

Summary Basolateral amygdala (BLA) principle cells are capable of driving and antagonizing behaviors of opposing valence. BLA neurons project to the central amygdala (CeA), which also participates in negative and positive behaviors. However, the CeA has primarily been studied as the site for negative behaviors and the causal role for CeA circuits underlying appetitive behaviors is poorly understood. Here we identified several genetically distinct populations of CeA neurons that mediate appetitive behaviors and dissected the BLA to CeA circuit for appetitive behaviors. Protein phosphatase 1 regulatory subunit 1B+ BLA pyramidal neurons to dopamine receptor 1+ CeA neurons define a pathway for promoting appetitive behaviors, while R-spondin 2+ BLA pyramidal neurons to dopamine receptor 2+ CeA neurons define a pathway for suppressing appetitive behaviors. These data reveal genetically defined neural circuits in the amygdala that promote and suppress appetitive behaviors analogous to the direct and indirect pathway of the basal ganglia.

show abstract

Accelerating Very Deep Convolutional Networks for Classification and Detection

Zhang

Zou

et al. 2016

IEEE Trans. Pattern Anal. Mach. Intell.

686

434

View full text Add to dashboard Cite

Abstract-This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs [1] that have substantially impacted the computer vision community. Unlike previous methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We develop an effective solution to the resulting nonlinear optimization problem without the need of stochastic gradient descent (SGD). More importantly, while previous methods mainly focus on optimizing one or two layers, our nonlinear method enables an asymmetric reconstruction that reduces the rapidly accumulated error when multiple (e.g., ≥10) layers are approximated. For the widely used very deep VGG-16 model [1], our method achieves a whole-model speedup of 4× with merely a 0.3% increase of top-5 error in ImageNet classification. Our 4× accelerated VGG-16 model also shows a graceful accuracy degradation for object detection when plugged into the Fast R-CNN detector [2].

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiangyu Zhang

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Identity Mappings in Deep Residual Networks

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Channel Pruning for Accelerating Very Deep Neural Networks

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Single Path One-Shot Neural Architecture Search with Uniform Sampling

Basolateral to Central Amygdala Neural Circuits for Appetitive Behaviors

Accelerating Very Deep Convolutional Networks for Classification and Detection

Contact Info

Product

Resources

About