SAND-mask: An Enhanced Gradient Masking Strategy for the Discovery of Invariances in Domain Generalization

Shahtalebi, Soroosh; Gagnon-Audet, Jean-Christophe; Laleh, Touraj; Faramarzi, Mojtaba; Ahuja, Kartik; Rish, Irina

doi:10.48550/arxiv.2106.02266

Cited by 8 publications

(13 citation statements)

References 24 publications

(55 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reduction of the representation distribution mismatch across source domains can be achieved by minimizing the maximum mean discrepancy criteria (Gretton et al 2012) combined with an adversarial autoencoder (MMD) (Li et al 2018b), minimizing the difference between the means (Tzeng et al 2014) or covariance matrices (CORAL) (Sun and Saenko 2016) in the embedding space across different domains, or minimizing a con-trastive loss as a regularization (Motiian et al 2017;Yoon, Hamarneh, and Garbi 2019;Mahajan, Tople, and Sharma 2020), e.g., SelfReg (Kim et al 2021). The domain alignment is also addressed by promoting the loss gradient alignment across different domains via inner product maximization (Fish) (Shi et al 2021) or binary (AND-mask) (Parascandolo et al 2020;Shahtalebi et al 2021) or continuous gradient masking (SAND-mask) (Shahtalebi et al 2021).…”

Section: Domain Generalizationmentioning

confidence: 99%

COLUMBUS: Automated Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption

Frikha¹,

Krompaß²,

Tresp³

2021

Preprint

View full text Add to dashboard Cite

Machine learning models that can generalize to unseen domains are essential when applied in real-world scenarios involving strong domain shifts. We address the challenging domain generalization (DG) problem, where a model trained on a set of source domains is expected to generalize well in unseen domains without any exposure to their data. The main challenge of DG is that the features learned from the source domains are not necessarily present in the unseen target domains, leading to performance deterioration. We assume that learning a richer set of features is crucial to improve the transfer to a wider set of unknown domains. For this reason, we propose COLUMBUS, a method that enforces new feature discovery via a targeted corruption of the most relevant input and multi-level representations of the data. We conduct an extensive empirical evaluation to demonstrate the effectiveness of the proposed approach which achieves new state-of-the-art results by outperforming 18 DG algorithms on multiple DG benchmark datasets in the DOMAINBED framework.

show abstract

Section: Domain Generalizationmentioning

confidence: 99%

COLUMBUS: Automated Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption

Frikha¹,

Krompaß²,

Tresp³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Domain alignment methods attempt to learn a domain-invariant representation of the data from the source domains by regularizing the learning objective. Variants of such a regularization include the minimization across the source domains of the maximum mean discrepancy criteria (MMD) [21,35], the minimization of a distance metric between the domain-specific means [71] or covariance matrices [69], the minimization of a contrastive loss [50,83,44,29], or the maximization of loss gradient alignment [65,63]. Other works use adversarial training with a domain discriminator model [18,37] for the same purpose.…”

Section: Domain Generalizationmentioning

confidence: 99%

Towards Data-Free Domain Generalization

Frikha¹,

Chen²,

Krompaß³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this work, we investigate the unexplored intersection of domain generalization and data-free learning. In particular, we address the question: How can knowledge contained in models trained on different source data domains can be merged into a single model that generalizes well to unseen target domains, in the absence of source and target domain data? Machine learning models that can cope with domain shift are essential for for real-world scenarios with often changing data distributions. Prior domain generalization methods typically rely on using source domain data, making them unsuitable for private decentralized data. We define the novel problem of Data-Free Domain Generalization (DFDG), a practical setting where models trained on the source domains separately are available instead of the original datasets, and investigate how to effectively solve the domain generalization problem in that case. We propose DEKAN, an approach that extracts and fuses domain-specific knowledge from the available teacher models into a student model robust to domain shift. Our empirical evaluation demonstrates the effectiveness of our method which achieves first state-of-the-art results in DFDG by significantly outperforming ensemble and data-free knowledge distillation baselines.

show abstract

“…Though proved advantageous in some curated test conditions, their proposed mechanism is intrinsically using arithmetic averaging of Hessians. Furthermore, AND-masking on the average loss landscape might induce "dead zones", where the gradients from different environments are sometimes unable to update the parameters unless their directions strictly match with each other (19).…”

Section: Introductionmentioning

confidence: 99%

“…Nevertheless, the masking strategy requires the direction to which the gradients from different environments point strictly match with each other in order to allow the gradients to update that particular parameter. To address this issue, Soroosh proposed a smoothed AND-masking algorithm (SAND-mask), which not only validates the invariance in the direction of gradients, but also promotes the agreement among the gradient magnitudes (19). Yet, regarding their experiment results, the proposed masking strategy barely competes the performance of other algorithms, including AND-masking strategy.…”

Section: Introductionmentioning

confidence: 99%

FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data

Zhu¹,

Ezzine²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Federated learning is a distributed machine learning approach which enables a shared server model to learn by aggregating the locally-computed parameter updates with the training data from spatially-distributed client silos. Though successfully possessing advantages in both scale and privacy, federated learning is hurt by domain shift problems, where the learning models are unable to generalize to unseen domains whose data distribution is non-i.i.d. with respect to the training domains. In this study, we propose the Federated Invariant Learning Consistency (FedILC) approach, which leverages the gradient covariance and the geometric mean of Hessians to capture both inter-silo and intra-silo consistencies of environments and unravel the domain shift problems in federated networks. The benchmark and real-world dataset experiments bring evidence that our proposed algorithm outperforms conventional baselines and similar federated learning algorithms. This is relevant to various fields such as medical healthcare, computer vision, and the Internet of Things (IoT). The code is released at https://github.com/mikemikezhu/FedILC.Preprint. Under review.

show abstract

SAND-mask: An Enhanced Gradient Masking Strategy for the Discovery of Invariances in Domain Generalization

Cited by 8 publications

References 24 publications

COLUMBUS: Automated Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption

COLUMBUS: Automated Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption

Towards Data-Free Domain Generalization

FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data

Contact Info

Product

Resources

About