Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1662
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Removal of Demographic Attributes Revisited

Abstract: Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a heldout subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
43
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 37 publications
(51 citation statements)
references
References 9 publications
(25 reference statements)
3
43
0
Order By: Relevance
“…1. We observe that we achieve similar scores than the ones reported in previous studies (Barrett et al, 2019;Elazar and Goldberg, 2018). This experiment shows that, when training to solve the main task, the classifier learns information about the protected attribute, i.e., the attacker's accuracy is better than random guessing.…”
Section: Applications To Fairnesssupporting
confidence: 84%
See 1 more Smart Citation
“…1. We observe that we achieve similar scores than the ones reported in previous studies (Barrett et al, 2019;Elazar and Goldberg, 2018). This experiment shows that, when training to solve the main task, the classifier learns information about the protected attribute, i.e., the attacker's accuracy is better than random guessing.…”
Section: Applications To Fairnesssupporting
confidence: 84%
“…6 shows that the cross-entropy loss leads to a lower bound (up to a constant) on the MI. Although the cross-entropy can lead to good estimates of the conditional entropy, the adversarial approaches for classification and sequence generation by (Barrett et al, 2019;John et al, 2018) which consists in maximizing the cross-entropy, induces a degeneracy (unbounded loss) as λ increases in the underlying optimization problem. As we will observe in next section, our variational upper bound in Th.…”
Section: Comparison To Existing Methodsmentioning
confidence: 99%
“…While adverserial methods showed impressive performance in various machine learning tasks, and were applied for the goal of removal of sensitive information (Elazar and Goldberg, 2018;Coavoux et al, 2018;Resheff et al, 2019;Barrett et al, 2019), they are notoriously hard to train. Elazar and Goldberg (2018) have evaluated adverserial methods for the removal of demographic information from representations.…”
Section: Related Workmentioning
confidence: 99%
“…in cross-lingual (Schuster et al, 2019;Liu et al, 2019) or multilingual settings (Cao et al, 2020). In contrast to prior work, we experiment with aligning BERT using adversarial training, which is related to using adversarial training for domain adaptation (Ganin et al, 2016), coping with bias or confounding variables (Li et al, 2018;Raff and Sylvester, 2018;Zhang et al, 2018;Barrett et al, 2019;McHardy et al, 2019) or transferring models from a source to a target language (Zhang et al, 2017;Keung et al, 2019;Wang et al, 2019). Similar to Chen and Cardie (2018), we use a multinomial discriminator in our setting.…”
Section: Related Workmentioning
confidence: 99%