2021
DOI: 10.48550/arxiv.2110.03360
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sparse MoEs meet Efficient Ensembles

Abstract: Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, lead to strong performance. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that the two approaches have complementary features whose combination is beneficial. Then, we present partitioned batch ensembles, an efficient ensemble of sparse MoEs that takes the best of both classes of models. Exte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 11 publications
0
1
0
Order By: Relevance
“…We perform extensive ablation experiments to show the effectiveness of SKDBERT in terms of teacher ensemble, sampling distribution, KD paradigm, extra learning procedure and distillation objective. Appropriately increasing the number of teachers can effectively improve the diversity of prediction (Allingham et al 2021) for obtaining better performance. As a result, we discuss the effectiveness of weak teachers (e.g., T 01 to T 03 for SKDBERT 4 , T 01 to T 06 for SKDBERT 6 ).…”
Section: Ablation Studiesmentioning
confidence: 99%
“…We perform extensive ablation experiments to show the effectiveness of SKDBERT in terms of teacher ensemble, sampling distribution, KD paradigm, extra learning procedure and distillation objective. Appropriately increasing the number of teachers can effectively improve the diversity of prediction (Allingham et al 2021) for obtaining better performance. As a result, we discuss the effectiveness of weak teachers (e.g., T 01 to T 03 for SKDBERT 4 , T 01 to T 06 for SKDBERT 6 ).…”
Section: Ablation Studiesmentioning
confidence: 99%
“…Key future challenges Quantifying data redundancy shall be investigated in our future study based on work by authors in (Birodkar et al, 2019), (Guo et al, 2021). To improve the heuristic function, recent work (Lakshminarayanan et al, 2016) (Allingham et al, 2021 on explicit ensembles shows strong results for uncertainty computing, and (Aghdam et al, 2019) show that adding temporal reasoning can be beneficial for data selection on object detection task. We aim to further our study by experiment on different budget sizes, while testing on the complete Semantic-KITTI dataset.…”
Section: Model Stability and Effectiveness For Samplingmentioning
confidence: 99%