BayesNAS: A Bayesian Approach for Neural Architecture Search

Zhou, Hongpeng; Yang, Minghao; Wang, Jun; Pan, Wei

doi:10.48550/arxiv.1905.04919

Cited by 24 publications

(15 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Architecture Space The DARTs operation space O contains eight choices: none (zero), skip connection, separable convolution 3 × 3 and 5 × 5, dilated separable convolution 3 × 3 and 5 × 5, max pooling 3 × 3, average pooling 3 × 3. Following previous works (Liu et al, 2018b;Chen et al, 2019;Xu et al, 2019), for evaluation phases, we stack 20 cells to compose the network and set the (Liu et al, 2018a) 3.41(0.09) 3.2 225 SMBO ENAS (Pham et al, 2018) 2.89 4.6 0.5 RL NASNet-A 2.65 3.3 2000 RL DARTS (1st) (Liu et al, 2018b) 3.00(0.14) 3.3 0.4 gradient DARTS (2nd) (Liu et al, 2018b) 2.76(0.09) 3.3 1.0 gradient SNAS (Xie et al, 2018) 2.85(0.02) 2.8 1.5 gradient GDAS (Dong & Yang, 2019) 2.82 2.5 0.17 gradient BayesNAS (Zhou et al, 2019) 2.81(0.04) 3.4 0.2 gradient ProxylessNAS (Cai et al, 2018) † 2.08 5.7 4.0 gradient P-DARTS (Chen et al, 2019) 2.50 3.4 0.3 gradient PC-DARTS (Xu et al, 2019) 2.57(0.07) 3.6 0.1 gradient SDARTS-ADV (Chen & Hsieh, 2020) 2.61(0.02) (Han et al, 2017) as the backbone. ‡ Recorded on a single GTX 1080Ti GPU.…”

Section: Results On Cifar-10 With Darts Search Spacementioning

confidence: 99%

“…Searching on ImageNet takes a longer time than on CIFAR-10 due to the larger input size and more network parameters. (Real et al, 2019) 24.3 7.6 6.4 3150 evolution PNAS (Liu et al, 2018a) 25.8 8.1 5.1 225 SMBO MnasNet-92 (Tan et al, 2019) 25.2 8.0 4.4 -RL DARTS (2nd) (Liu et al, 2018b) 26.7 8.7 4.7 4.0 gradient SNAS (mild) (Xie et al, 2018) 27.3 9.2 4.3 1.5 gradient GDAS (Dong & Yang, 2019) 26.0 8.5 5.3 0.21 gradient BayesNAS (Zhou et al, 2019) 26.5 8.9 3.9 0.2 gradient P-DARTS (CIFAR-10) (Chen et al, 2019) 24.4 7.4 4.9 0.3 gradient P-DARTS (CIFAR-100) (Chen et al, 2019) 24.7 7.5 5.1 0.3 gradient PC-DARTS (CIFAR-10) (Xu et al, 2019) 25…”

Section: Results On Imagenet With Darts Search Spacementioning

confidence: 99%

See 1 more Smart Citation

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

Chen¹,

Gong²,

Wang³

2021

Preprint

View full text Add to dashboard Cite

Neural Architecture Search (NAS) has been explosively studied to automate the discovery of top-performer neural networks. Current works require heavy training of supernet or intensive architecture evaluations, thus suffering from heavy resource consumption and often incurring search bias due to truncated training or approximations. Can we select the best neural architectures without involving any training and eliminate a drastic portion of the search cost? We provide an affirmative answer, by proposing a novel framework called training-free neural architecture search (TE-NAS). TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space. Both are motivated by recent theory advances in deep networks and can be computed without any training and any label. We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy. Further on, we design a pruning-based NAS mechanism to achieve a more flexible and superior trade-off between the trainability and expressivity during the search. In NAS-Bench-201 and DARTS search spaces, TE-NAS completes high-quality search but only costs 0.5 and 4 GPU hours with one 1080Ti on CIFAR-10 and ImageNet, respectively. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in real NAS applications. Code is available at: https://github.com/VITA-Group/TENAS.

show abstract

Section: Results On Cifar-10 With Darts Search Spacementioning

confidence: 99%

Section: Results On Imagenet With Darts Search Spacementioning

confidence: 99%

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

Chen¹,

Gong²,

Wang³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…NAS is usually time consuming. We have seen great improvements from 24, 000 GPU-days [26] to 0.2 GPUday [23]. The trick is to first construct a supernet containing the complete search space and train the candidates all at once with bi-level optimization and efficient weight sharing [12].…”

Section: Neural Architecture Searchmentioning

confidence: 99%

NAS-FCOS: Fast Neural Architecture Search for Object Detection

Wang

Gao

Chen

et al. 2019

Preprint

View full text Add to dashboard Cite

The success of deep neural networks relies on significant architecture engineering. Recently neural architecture search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount of computational resources, e.g., a few thousand GPU-days. To date, on challenging vision tasks such as object detection, NAS, especially fast versions of NAS, is less studied.Here we propose to search for the decoder structure of object detectors with search efficiency being taken into consideration. To be more specific, we aim to efficiently search for the feature pyramid network (FPN) as well as the prediction head of a simple anchor-free object detector, namely FCOS [20], using a tailored reinforcement learning paradigm. With carefully designed search space, search algorithms and strategies for evaluating network quality, we are able to efficiently search more than 2, 000 architectures in around 30 GPU-days. The discovered architecture surpasses state-of-the-art object detection models (such as Faster R-CNN, RetinaNet and FCOS) by 1 to 1.9 points in AP on the COCO dataset, with comparable computation complexity and memory footprint, demonstrating the efficacy of the proposed NAS for object detection.

show abstract

“…Despite the inspiring success of NAS, the search space of conventional NAS algorithms is extremely large, leading the exhaustive search for the optimal network will be computation prohibited. To accommodate the searching budget, heuristic searching methods are usually leveraged and can be mainly categorized into reinforcement learning-based [25,26], evolution-based [9,35], Bayesian optimization-based [37,29], and gradient-based methods [19,1,34,14].…”

Section: Introductionmentioning

confidence: 99%

“…Though existing one-shot NAS methods have achieved impressive performance, they often consider each layer separately while ignoring the dependencies between the operation choices on different layers, which leads to an inaccurate description and evaluation of the neural architectures during the search. For example, Gaussian Processes (GP) in Bayesian optimization requires that the input attributes (OPs) are independent of each other [37,29], and the cross mutations of OPs in evolutionary search are often carried out separately in each layer [9,35]. In fact, for a feedforward neural network, the choice of a specific layer relates to its previous layers and contributes to its post layers.…”

Section: Introductionmentioning

confidence: 99%

Prioritized Architecture Sampling with Monto-Carlo Tree Search

Su¹,

Huang²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

One-shot neural architecture search (NAS) methods significantly reduce the search cost by considering the whole search space as one network, which only needs to be trained once. However, current methods select each operation independently without considering previous layers. Besides, the historical information obtained with huge computation cost is usually used only once and then discarded. In this paper, we introduce a sampling strategy based on Monte Carlo tree search (MCTS) with the search space modeled as a Monte Carlo tree (MCT), which captures the dependency among layers. Furthermore, intermediate results are stored in the MCT for future decisions and a better explorationexploitation balance. Concretely, MCT is updated using the training loss as a reward to the architecture performance; for accurately evaluating the numerous nodes, we propose node communication and hierarchical node selection methods in the training and search stages, respectively, which make better uses of the operation rewards and hierarchical information. Moreover, for a fair comparison of different NAS methods, we construct an open-source NAS benchmark of a macro search space evaluated on CIFAR-10, namely NAS-Bench-Macro. Extensive experiments on NAS-Bench-Macro and ImageNet demonstrate that our method significantly improves search efficiency and performance. For example, by only searching 20 architectures, our obtained architecture achieves 78.0% top-1 accuracy with 442M FLOPs on ImageNet. Code (Benchmark) is available at: https://github.com/xiusu/NAS-Bench-Macro.

show abstract

BayesNAS: A Bayesian Approach for Neural Architecture Search

Cited by 24 publications

References 25 publications

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

NAS-FCOS: Fast Neural Architecture Search for Object Detection

Prioritized Architecture Sampling with Monto-Carlo Tree Search

Contact Info

Product

Resources

About