2021
DOI: 10.1002/int.22415
|View full text |Cite
|
Sign up to set email alerts
|

Missing the missing values: The ugly duckling of fairness in machine learning

Abstract: Nowadays, there is an increasing concern in machine learning about the causes underlying unfair decision making, that is, algorithmic decisions discriminating some groups over others, especially with groups that are defined over protected attributes, such as gender, race and nationality. Missing values are one frequent manifestation of all these latent causes: protected groups are more reluctant to give information that could be used against them, sensitive information for some groups can be erased by human op… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
39
1
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 35 publications
(41 citation statements)
references
References 58 publications
(90 reference statements)
0
39
1
1
Order By: Relevance
“…In contrast, we focus on the case where the input features are missing and may thus impact performance in downstream prediction tasks. The works that are most relevant to our work are Fernando et al (2021); Wang and Singh (2021) as they examine the intersection of general data missingness and fairness. While Fernando et al (2021) presents a comprehensive investigation on the relationship between fairness and missing values, their analyses are limited to observational and empirical studies on how different ways of handling missing values can affect fairness.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In contrast, we focus on the case where the input features are missing and may thus impact performance in downstream prediction tasks. The works that are most relevant to our work are Fernando et al (2021); Wang and Singh (2021) as they examine the intersection of general data missingness and fairness. While Fernando et al (2021) presents a comprehensive investigation on the relationship between fairness and missing values, their analyses are limited to observational and empirical studies on how different ways of handling missing values can affect fairness.…”
Section: Related Workmentioning
confidence: 99%
“…The works that are most relevant to our work are Fernando et al (2021); Wang and Singh (2021) as they examine the intersection of general data missingness and fairness. While Fernando et al (2021) presents a comprehensive investigation on the relationship between fairness and missing values, their analyses are limited to observational and empirical studies on how different ways of handling missing values can affect fairness. Wang and Singh (2021) proposes reweighting scheme that assigns lower weight to data points with missing values by extending the preprocessing scheme given in Calmon et al (2017).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This means that if the ML algorithms had been trained on the modified training data, it would not have exhibited the unexpected or undesirable behavior or would have exhibited this behavior to a lesser degree. Explanations generated by our framework, which complement existing approaches in XAI, are crucial for helping system developers and ML practitioners to debug ML algorithms for data errors and bias in training data, such as measurement errors and misclassifications [35,42,94], data imbalance [27], missing data and selection bias [29,62,63], covariate shift [74,82], technical biases introduced during data preparation [85], and poisonous data points injected through adversarial attacks [36,43,65,83]. It is known in the algorithmic fairness literature that information about the source of bias is critically needed to build fair ML algorithms because no current bias mitigation solution fits all situations [27,31,36,82,94].…”
Section: Introductionmentioning
confidence: 99%
“…More compact and coherent descriptions are needed. Furthermore, sources of bias and discrimination in training data are typically not randomly distributed across different sub-populations; rather they manifest systematic errors in data collection, selection, feature engineering, and curation [29,35,42,62,63,70,94]. That is, more often than not, certain cohesive subsets of training data are responsible for bias.…”
Section: Introductionmentioning
confidence: 99%