A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

Srinivas, S.; Sarvadevabhatla, Ravi Kiran; Mopuri, Konda Reddy; Prabhu, Nikita; Kruthiventi, Srinivas S. S.; Babu, R. Venkatesh

doi:10.3389/frobt.2015.00036

Cited by 179 publications

(96 citation statements)

References 99 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The survey reported in [41] reviewed the famous architectures from 2012-2015 along with their components. Similarly, there are prominent surveys that discuss different algorithms and applications of CNN [14], [16], [17], 5 [42], [43]. Likewise, the survey presented in [44] discusses taxonomy of CNNs based on acceleration techniques.…”

Section: Introductionmentioning

confidence: 99%

A survey of the recent architectures of deep convolutional neural networks

et al. 2020

View full text Add to dashboard Cite

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown exemplary performance on several competitions related to Computer Vision and Image Processing. Interesting application areas of CNN include Image Classification and Segmentation, Object Detection, Video Processing, Natural Language Processing, Speech Recognition, etc. The powerful learning ability of deep CNN is largely due to the use of multiple feature extraction stages that can automatically learn representations from the data. Availability of a large amount of data and improvements in the hardware technology have accelerated the research in CNNs, and recently very interesting deep CNN architectures have been reported. In fact, several interesting ideas to bring advancements in CNNs have been explored such as the use of different activation and loss functions, parameter optimization, regularization, and architectural innovations. However, the major improvement in representational capacity of the deep CNN is achieved through architectural innovations. Especially, the idea of exploiting spatial and channel information, depth and width of architecture, and multi-path information processing has gained substantial attention. Similarly, the idea of using a block of layers as a structural unit is also gaining popularity. This survey thus focuses on the intrinsic taxonomy present in the recently reported deep CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature-map exploitation, channel boosting, and attention. Additionally, the elementary understanding of CNN components, current challenges and applications of CNN are also provided. CNNs are the best among learning algorithms in understanding images content, and have shown exemplary results in segmentation, classification, detection, and retrieval related tasks [8], [9]. The success of CNNs has captured attention beyond academia. In industry, companies such as Google, Microsoft, AT&T, NEC, and Facebook have developed active research groups for exploring new architectures of CNN [10]. At present, most of the frontrunners of image processing and computer vision competitions are employing deep CNN based models.The attractive feature of CNN is its ability to exploit spatial or time correlation of the data. The topology of CNN is divided into multiple learning stages composed of a combination of the convolutional layers, non-linear processing units, and subsampling layers [11]. CNNs are feedforward multilayered hierarchical networks that are similar to fully connected neural network where each layer, using a bank of convolutional kernels, performs multiple transformations [12]. Convolution operation extracts useful features from locally correlated data points. Output of the convolutional kernels is assigned to non-linear processing unit (activation function), which not only helps in learning abstractions but also emb...

show abstract

Section: Introductionmentioning

confidence: 99%

A survey of the recent architectures of deep convolutional neural networks

et al. 2020

View full text Add to dashboard Cite

show abstract

“…A widely used classifier is the SoftMax function. Compared to the rest of the network, its computational complexity is usually small [4] [30]. The first layer connects the network to the input volume which can be an image, a video frame, or a signal, depending on the application (a 3-channel R,G,B image for instance).…”

Section: A Convolutional Neural Networkmentioning

confidence: 99%

“…For the sake of generality, STREAM SUM, STREAM SCALE, STREAM SHIFT, and STREAM MIN are implemented, as well. Another widely used operation in Conv-Nets is pooling [4]. NST supports max-pooling [53] through the STREAM MAXPL command.…”

Section: A Inference With Nstsmentioning

confidence: 99%

“…These companies are interested in running such algorithms on powerful compute clusters in large data centers. Convolutional neural networks (ConvNets) are known as the SoA ML algorithms specialized at BIC [4]. ConvNets process raw data directly, combining the classical models of feature extraction and classification into a single algorithm.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

Azarkhish

Rossi

Loi

et al. 2018

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-High-performance computing systems are moving towards 2.5D and 3D memory hierarchies, based on High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) to mitigate the main memory bottlenecks. This trend is also creating new opportunities to revisit near-memory computation. In this paper, we propose a flexible processor-in-memory (PIM) solution for scalable and energy-efficient execution of deep convolutional networks (ConvNets), one of the fastest-growing workloads for servers and high-end embedded systems. Our codesign approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster. NeuroClusters have a modular design based on NeuroStream coprocessors (for Convolution-intensive computations) and general-purpose RISC-V cores. In addition, a DRAM-friendly tiling mechanism and a scalable computation paradigm are presented to efficiently harness this computational capability with a very low programming effort. NeuroCluster occupies only 8% of the total logic-base (LoB) die area in a standard HMC and achieves an average performance of 240 GFLOPS for complete execution of full-featured state-of-the-art (SoA) ConvNets within a power budget of 2.5 W. Overall 11 W is consumed in a single SMC device, with 22.5 GFLOPS/W energy-efficiency which is 3.5X better than the best GPU implementations in similar technologies. The minor increase in system-level power and the negligible area increase make our PIM system a cost-effective and energy efficient solution, easily scalable to 955 GFLOPS with a small network of just four SMCs.

show abstract

“…Previous to the resurgence of CNN models [27], commonly followed computer vision approaches in VPR employed handcrafted robust features such as SIFT [28], SURF [28], ORB [29], etc. to represent images, encoding them into BoW-like models by using pre-trained dictionaries of visual words [4,11,13,30].…”

Section: Related Workmentioning

confidence: 99%

Spatio-Semantic ConvNet-Based Visual Place Recognition

Camara

Přeučil

2019

2019 European Conference on Mobile Robots (ECMR)

View full text Add to dashboard Cite

We present a Visual Place Recognition system that follows the two-stage format common to image retrieval pipelines. The system encodes images of places by employing the activations of different layers of a pre-trained, off-the-shelf, VGG16 Convolutional Neural Network (CNN) architecture. In the first stage of our method and given a query image of a place, a number of top candidate images is retrieved from a previously stored database of places. In the second stage, we propose an exhaustive comparison of the query image against these candidates by encoding semantic and spatial information in the form of CNN features. Results from our approach outperform by a large margin state-of-the-art visual place recognition methods on five of the most commonly used benchmark datasets. The performance gain is especially remarkable on the most challenging datasets, with more than a twofold recognition improvement with respect to the latest published work.

show abstract

A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

Cited by 179 publications

References 99 publications

A survey of the recent architectures of deep convolutional neural networks

A survey of the recent architectures of deep convolutional neural networks

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

Spatio-Semantic ConvNet-Based Visual Place Recognition

Contact Info

Product

Resources

About