2020
DOI: 10.1609/aaai.v34i04.6143
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Neural Architecture Search via Proximal Iterations

Abstract: Neural architecture search (NAS) attracts much research attention because of its ability to identify better architectures than handcrafted ones. Recently, differentiable search methods become the state-of-the-arts on NAS, which can obtain high-performance architectures in several days. However, they still suffer from huge computation costs and inferior performance due to the construction of the supernet. In this paper, we propose an efficient NAS method based on proximal iterations (denoted as NASP). Different… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
59
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 81 publications
(61 citation statements)
references
References 13 publications
(26 reference statements)
2
59
0
Order By: Relevance
“…proposed DARTS to use search parameters together with a super network, which allows searching with gradient descent. Gradient-based methods (Cai et al, 2018b;Xie et al, 2018;Xu et al, 2019;Yao et al, 2020) attracts researchers' attention since it is computationally efficient and easy to implement. We base our method on DARTS and take one step further to reduce the memory consumption of training the super network.…”
Section: Related Workmentioning
confidence: 99%
“…proposed DARTS to use search parameters together with a super network, which allows searching with gradient descent. Gradient-based methods (Cai et al, 2018b;Xie et al, 2018;Xu et al, 2019;Yao et al, 2020) attracts researchers' attention since it is computationally efficient and easy to implement. We base our method on DARTS and take one step further to reduce the memory consumption of training the super network.…”
Section: Related Workmentioning
confidence: 99%
“…That is because the relaxed θ cannot converge to a one-hot vector [Zela et al, 2019, Chu et al, 2020, thus removing those operations at the end of search actually lead to a different architecture from the final searching result. Moreover, the mixed strategy must maintain all operators in the whole supernet, which requires more computational resources than the one-hot vector [Yao et al, 2020].…”
Section: Search Algorithmmentioning
confidence: 99%
“…Environment: Same gpu, same software version; 2. Settings: Batch size (160), init channel scale (24), training 50 epochs; 3. Implementation: Do not query the performance database.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…Some latest research proposed the non-magnitude-based network selection method [18]. Their method inevitably increases the time overhead and we provide the performance comparison in 2.8 2.85±0.02 PC-DARTS [21] 3.6 2.57±0.07 NASP [24] 3.3 2.83±0.09 GAEA+PC-DARTS [11] 3.7 2.50±0.06 DARTS+PT [18] 3.0 2.61±0.08 SDARTS-RS+PT [18] 3.3 2.54±0.10 SGAS+PT [18] 3.9 2.56±0. 3, the best accuracies ever obtained by DARTS are much higher than both the random search and the average performance of the search space, which suggests that the effectiveness of the magnitude may only last a short time during the training of DARTS.…”
Section: Performance Evaluationmentioning
confidence: 99%