How does topology influence gradient propagation and model performance of deep networks with DenseNet-type skip connections?

Bhardwaj, Kartikeya; Li, Guihong; Mărculescu, Radu

doi:10.1109/cvpr46437.2021.01329

Cited by 12 publications

(23 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We denote the network width as 𝑤 𝑐 = [2,3,4]. Finally, the maximum number of channels that can supply skip connections is given by 𝑡 𝑐 = [2,5,6]. That is, the first cell can have a maximum of two skip connection candidates per layer (i.e., previous channels that can supply skip connections), the second cell can have a maximum of five skip connections candidates per layer, and so on.…”

Section: Methodsmentioning

confidence: 99%

“…Similarly, a DNN architecture can be seen as a network of connected neurons. As discussed in [5], the topology of deep networks has a significant impact on how effectively the gradients can propagate through the network and thus the test performance of neural networks. These observations motivate us to take an approach from network science to quantify the topological property of neural networks to accelerate NAS.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FLASH: Fast Neural Architecture Search with Hardware Optimization

Mandal

Ogras

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

FLASH: Fast Neural Architecture Search with Hardware Optimization

Mandal

Ogras

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Training ViTs is slow: hence an architecture search guided by evaluating trained models' accuracies will be dauntingly expensive. We note a recent surge of training-free neural architecture search methods for ReLU-based CNNs, leveraging local linear maps (Mellor et al, 2020), gradient sensitivity (Abdelfattah et al, 2021), number of linear regions (Chen et al, 2021e;f), or network topology (Bhardwaj et al, 2021). However, ViTs are equipped with more complex non-linear functions: self-attention, softmax, and GeLU.…”

Section: Assessing Vit Complexity At Initialization Via Manifold Prop...mentioning

confidence: 99%

Auto-scaling Vision Transformers without Training

Chen¹,

Huang²,

Du³

et al. 2022

Preprint

View full text Add to dashboard Cite

This work targets automated designing and scaling of Vision Transformers (ViTs). The motivation comes from two pain spots: 1) the lack of efficient and principled methods for designing and scaling ViTs; 2) the tremendous computational cost of training ViT that is much heavier than its convolution counterpart. To tackle these issues, we propose As-ViT, an auto-scaling framework for ViTs without training, which automatically discovers and scales up ViTs in an efficient and principled manner. Specifically, we first design a "seed" ViT topology by leveraging a trainingfree search process. This extremely fast search is fulfilled by a comprehensive study of ViT's network complexity, yielding a strong Kendall-tau correlation with ground-truth accuracies. Second, starting from the "seed" topology, we automate the scaling rule for ViTs by growing widths/depths to different ViT layers. This results in a series of architectures with different numbers of parameters in a single run. Finally, based on the observation that ViTs can tolerate coarse tokenization in early training stages, we propose a progressive tokenization strategy to train ViTs faster and cheaper. As a unified framework, As-ViT achieves strong performance on classification (83.5% top1 on ImageNet-1k) and detection (52.7% mAP on COCO) without any manual crafting nor scaling of ViT architectures: the end-toend model design and scaling process costs only 12 hours on one V100 GPU. Our code is available at https://github.com/VITA-Group/AsViT.

show abstract

“…Also, some interesting phenomena (Frankle et al, 2020) are observed during the early phase of NN training, such as trainable sparse sub-networks emerge (Frankle et al, 2019), gradient descent moves into a small subspace (Gur-Ari et al, 2018), and there exists a critical effective connection between layers (Achille et al, 2019). Bhardwaj et al (2021) built a nice connection between architectures (with concatenation-type skip connections) and the performance, and proposed a new topological metric to identify NNs with similar accuracy. Many of these studies are built on dynamical system and network science.…”

Section: Epochmentioning

confidence: 99%

Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics

Jiang¹,

Pedapati²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Efficient model selection for identifying a suitable pre-trained neural network to a downstream task is a fundamental yet challenging task in deep learning. Current practice requires expensive computational costs in model training for performance prediction. In this paper, we propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections. Therefore, a converged neural network is associated with an equilibrium state of a networked system composed of those edges. To this end, we construct a network mapping φ, converting a neural network G A to a directed line graph G B that is defined on those edges in G A . Next, we derive a neural capacitance metric β eff as a predictive measure universally capturing the generalization capability of G A on the downstream task using only a handful of early training results. We carried out extensive experiments using 17 popular pre-trained ImageNet models and five benchmark datasets, including CIFAR10, CIFAR100, SVHN, Fashion MNIST and Birds, to evaluate the fine-tuning performance of our framework. Our neural capacitance metric is shown to be a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.Preprint.

show abstract

How does topology influence gradient propagation and model performance of deep networks with DenseNet-type skip connections?

Cited by 12 publications

References 19 publications

FLASH: Fast Neural Architecture Search with Hardware Optimization

FLASH: Fast Neural Architecture Search with Hardware Optimization

Auto-scaling Vision Transformers without Training

Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics

Contact Info

Product

Resources

About