2023
DOI: 10.1007/s11263-023-01824-8
|View full text |Cite
|
Sign up to set email alerts
|

Don’t Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

Abstract: published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 54 publications
(87 reference statements)
0
2
0
Order By: Relevance
“…We will also explore the opportunity of further improving the efficiency of the learning process by conditionally activating the top-k experts selected by the gate. To this purpose, it looks interesting the solution proposed in [49] to accelerate the training by using ODE-based gradient approximations, as an alternative to current sparse-training methods, exposed to slow-convergence and/or underfitting issues.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We will also explore the opportunity of further improving the efficiency of the learning process by conditionally activating the top-k experts selected by the gate. To this purpose, it looks interesting the solution proposed in [49] to accelerate the training by using ODE-based gradient approximations, as an alternative to current sparse-training methods, exposed to slow-convergence and/or underfitting issues.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, the initialization and joint training of gates and experts is crucial to avoid the pitfalls of random initial routing and the long convergence times associated with reinforce-based updates [12,48]. Other approaches include using gradient approximation methods [49], which can reduce the computation overhead.…”
Section: Mixture Of Experts (Moe) Classifiersmentioning
confidence: 99%