2022
DOI: 10.48550/arxiv.2202.06985
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Ensembles Work, But Are They Necessary?

Abstract: Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 25 publications
1
9
0
Order By: Relevance
“…Second, generating an ensemble with a size of a least 10 appears to be a sensible choice, with only minor improvements being observed for more than 20 members. This corresponds to the results in Fort et al (2019) and ensemble sizes typically chosen in the literature (Lakshminarayanan et al, 2017;Rasp and Lerch, 2018), but the benefits of generating more ensemble members need to be balanced against the computational costs, and sometimes smaller ensembles have been suggested (Ovadia et al, 2019;Abe et al, 2022). Third, aggregating forecast distributions via VI is often superior to the LP.…”
Section: Discussionsupporting
confidence: 54%
See 1 more Smart Citation
“…Second, generating an ensemble with a size of a least 10 appears to be a sensible choice, with only minor improvements being observed for more than 20 members. This corresponds to the results in Fort et al (2019) and ensemble sizes typically chosen in the literature (Lakshminarayanan et al, 2017;Rasp and Lerch, 2018), but the benefits of generating more ensemble members need to be balanced against the computational costs, and sometimes smaller ensembles have been suggested (Ovadia et al, 2019;Abe et al, 2022). Third, aggregating forecast distributions via VI is often superior to the LP.…”
Section: Discussionsupporting
confidence: 54%
“…Technically, we here use the unified PIT, a generalization proposed inVogel et al (2018), due to the format of some of the aggregated forecast distributions.2 For example,Lichtendahl et al (2013) andAbe et al (2022) show that the score of the LP forecast is at least as good as the average score of the individual components in terms of different proper scoring rules.…”
mentioning
confidence: 99%
“…Specifically, we propose to train a set of GNNs {GNN 1 , GNN 2 , ..., GNN }. Given a set of training nodes D ⊆ V 1 , we generate bootstraps {D (1) , D (2) , ..., D ( ) } subject to the constraint that |D (i) Ω k u | = 1 for all u ∈ D and i ≤ (i.e., each bootstrap contains exactly one relative of each training node). This constraint allows us to avoid the overrepresentation problem, address CH1 by sampling with replacement, and address CH2 by training each GNN i on different neighborhood subspaces as represented in D (i) .…”
Section: Why Deep Graph Ensembles?mentioning
confidence: 99%
“…DGE-batch* therefore represents a traditional GNN that is trained with awareness of CH1 and CH3, and addresses CH2 in a manner similar to a READOUT function [44]. We considered DGE-batch* an important variant to evaluate because its performance illuminates the importance of using an ensemble instead of a single, more complex model to solve CH2 [1].…”
Section: Training and Inferencementioning
confidence: 99%
“…We showcase these properties theoretically and demonstrate the benefits of transformation ensembles empirically on several semi-structured data sets. With transformation ensembles we are able to provide empirical evidence for answering open questions in deep ensembling (Abe et al, 2022). For instance, the increased flexibility of classical deep ensembles over their members does not seem to be necessary for improving prediction performance or allowing uncertainty quantification.…”
Section: Our Contributionmentioning
confidence: 99%