Pierre Foret scite author profile

In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by the connection between geometry of the loss landscape and generalization-including a generalization bound that we prove here-we introduce a novel, effective procedure for instead simultaneously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-{10, 100}, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that specifically target learning with noisy labels. * Work done as part of the Google AI Residency program.

show abstract

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

Jiang¹,

Foret²,

Yak³

et al. 2020

Preprint

View full text Add to dashboard Cite

Understanding generalization in deep learning is arguably one of the most important questions in deep learning. Deep learning has been successfully adopted to a large number of problems ranging from pattern recognition to complex decision making, but many recent researchers have raised many concerns about deep learning, among which the most important is generalization. Despite numerous attempts, conventional statistical learning approaches have yet been able to provide a satisfactory explanation on why deep learning works. A recent line of works aims to address the problem by trying to predict the generalization performance through complexity measures. In this competition, we invite the community to propose complexity measures that can accurately predict generalization of models. A robust and general complexity measure would potentially lead to a better understanding of deep learning's underlying mechanism and behavior of deep models on unseen data, or shed light on better generalization bounds. All these outcomes will be important for making deep learning more robust and reliable. * Lead organizer: Yiding Jiang; Scott Yak and Pierre Foret help implement large portion of the infrastructure and the remaining organizers' order is randomized.

show abstract

A Multilingual View of Unsupervised Machine Translation

García

Foret

Sellam

et al. 2020

View full text Add to dashboard Cite

We present a probabilistic framework for multilingual neural machine translation that encompasses supervised and unsupervised setups, focusing on unsupervised translation. In addition to studying the vanilla case where there is only monolingual data available, we propose a novel setup where one language in the (source, target) pair is not associated with any parallel data, but there may exist auxiliary parallel data that contains the other. This auxiliary data can naturally be utilized in our probabilistic framework via a novel cross-translation loss term. Empirically, we show that our approach results in higher BLEU scores over state-of-the-art unsupervised models on the WMT'14 English-French, WMT'16 English-German, and WMT'16 English-Romanian datasets in most directions.

show abstract

A Multilingual View of Unsupervised Machine Translation

García¹,

Foret²,

Sellam³

et al. 2020

Preprint

View full text Add to dashboard Cite

Interpretable Identification of Cybersecurity Vulnerabilities from News Articles

Foret¹,

Ruşeţi

Sandescu

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pierre Foret

Sharpness-Aware Minimization for Efficiently Improving Generalization

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

A Multilingual View of Unsupervised Machine Translation

A Multilingual View of Unsupervised Machine Translation

Interpretable Identification of Cybersecurity Vulnerabilities from News Articles

Contact Info

Product

Resources

About