Dropout Prediction Uncertainty Estimation Using Neuron Activation Strength

Yu, Hwanjo; Chen, Zhe; Lin, Dong; Shamir, G.I.; Han, Jianda

doi:10.48550/arxiv.2110.06435

Cited by 2 publications

(4 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike classification applications that focus on the final label, in recommendation systems, the exact predicted engagement probabilities can make a difference. Theoretically, we can use different statistics averaged over multiple models [11,46,53] such as standard deviations or KL divergences. In [46], various 𝐿 𝑝 norms relative to an average prediction of a set of models were considered.…”

Section: Prediction Differencementioning

confidence: 99%

“…Despite its importance, it received very little attention in academic publications. Only recently, a series of empirical works [11,13,17,[46][47][48][49]53] demonstrated it. An initial theoretical framework for reproducibility in optimization only appears in very recent work [2] and demonstrates the problem for the much simpler case of convex optimization.…”

mentioning

confidence: 99%

“…Unlike overfitting, duplicate models that produce different recommendations may have identical average loss or accuracy metrics, but they can exhibit potentially very large Prediction Differences (PDs) on individual examples [11,17,53]. PDs do not diminish with more training examples (unlike epistemic/model uncertainty).…”

mentioning

confidence: 99%

See 2 more Smart Citations

Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations

Shamir¹,

Lin²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Real world recommendation systems influence a constantly growing set of domains. With deep networks, that now drive such systems, recommendations have been more relevant to the user's interests and tasks. However, they may not always be reproducible even if produced by the same system for the same user, recommendation sequence, request, or query. This problem received almost no attention in academic publications, but is, in fact, very realistic and critical in real production systems. We consider reproducibility of real large scale deep models, whose predictions determine such recommendations. We demonstrate that the celebrated Rectified Linear Unit (ReLU) activation, used in deep models, can be a major contributor to irreproducibility. We propose the use of smooth activations to improve recommendation reproducibility. We describe a novel family of smooth activations; Smooth ReLU (SmeLU ), designed to improve reproducibility with mathematical simplicity, with potentially cheaper implementation. SmeLU is a member of a wider family of smooth activations. While other techniques that improve reproducibility in real systems usually come at accuracy costs, smooth activations not only improve reproducibility, but can even give accuracy gains. We report metrics from real systems in which we were able to productionalize SmeLU with substantial reproducibility gains and better accuracy-reproducibility trade-offs. These include click-through-rate (CTR) prediction systems, content, and application recommendation systems.

show abstract

Section: Prediction Differencementioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations

Shamir¹,

Lin²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Beyond practical deployments of machine learned models, the reproducibility crisis in the machine learning academic world has also been well-documented: see [Pineau et al, 2021] and the references therein for an excellent discussion of the reasons for irreproducibility (insufficient exploration of hyperparameters and experimental setups, lack of sufficient documentation, inaccessible code, and different computational hardware) and for mitigation recommendations. However, recent papers , D'Amour et al, 2020, Dusenberry et al, 2020, Snapp and Shamir, 2021, Summers and Dinneen, 2021, Yu et al, 2021 have also demonstrated that even when models are trained on identical datasets with identical optimization algorithms, architectures, and hyperparameters, they can produce significantly different predictions on the same example. This type of irreproducibility may be caused by multiple factors [D'Amour et al, 2020, Fort et al, 2020, Frankle et al, 2020, Shallue et al, 2018, Snapp and Shamir, 2021, Summers and Dinneen, 2021, such as non-convexity of the objective, random initialization, nondeterminism in training such as data shuffling, parallelism, random schedules, hardware used, and round off quantization errors.…”

Section: Introductionmentioning

confidence: 99%

Reproducibility in Optimization: Theoretical Framework and Limits

Ahn¹,

Jain²,

Ji³

et al. 2022

Preprint

View full text Add to dashboard Cite

We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization. We then analyze several convex optimization settings of interest such as smooth, non-smooth, and strongly-convex objective functions and establish tight bounds on the limits of reproducibility in each setting. Our analysis reveals a fundamental trade-off between computation and reproducibility: more computation is necessary (and sufficient) for better reproducibility.

show abstract

Dropout Prediction Uncertainty Estimation Using Neuron Activation Strength

Cited by 2 publications

References 35 publications

Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations

Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations

Reproducibility in Optimization: Theoretical Framework and Limits

Contact Info

Product

Resources

About