2023
DOI: 10.1145/3588433
|View full text |Cite
|
Sign up to set email alerts
|

Representation Bias in Data: A Survey on Identification and Resolution Techniques

Abstract: Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. Given that “bias in, bias out”, one cannot expect AI-based solutions to have equitable outcomes for societal applications, without addressing issues such as representation bias… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
21
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(43 citation statements)
references
References 78 publications
0
21
0
Order By: Relevance
“…Training data bias Biases in training data are reflected in downstream models. Under-represented subgroups can suffer lower accuracy due to insufficient weight in the training data (Buolamwini & Gebru, 2018; Chen et al, 2018; Kleinberg et al, 2022; Shahbazi et al, 2023), and socially undesirable biases in data are often amplified by models (Bolukbasi et al, 2016; Caliskan et al, 2017; Taori & Hashimoto, 2023). Various papers have studied how re-weighting or curating datasets can mitigate these biases (Zhao et al, 2017; Ryu et al, 2017; Tschandl et al, 2018; Yang et al, 2020), even finding that overall performance is improved by over-weighting minority groups and actively increasing diversity in datasets (Gao et al, 2020; Rolf et al, 2021; Lee et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…Training data bias Biases in training data are reflected in downstream models. Under-represented subgroups can suffer lower accuracy due to insufficient weight in the training data (Buolamwini & Gebru, 2018; Chen et al, 2018; Kleinberg et al, 2022; Shahbazi et al, 2023), and socially undesirable biases in data are often amplified by models (Bolukbasi et al, 2016; Caliskan et al, 2017; Taori & Hashimoto, 2023). Various papers have studied how re-weighting or curating datasets can mitigate these biases (Zhao et al, 2017; Ryu et al, 2017; Tschandl et al, 2018; Yang et al, 2020), even finding that overall performance is improved by over-weighting minority groups and actively increasing diversity in datasets (Gao et al, 2020; Rolf et al, 2021; Lee et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…The underrepresentation of racial and ethnic minoritized groups in research can perpetuate representation bias in data collection, discrimination, and disparities. 25,26 Broadly, racial bias can be described as preconceptions, unconscious ideas, or experiences that make people think and act in a prejudiced manner. 27 Bias in data indicates errors that arise when certain elements of a database get more attention or overrepresented.…”
Section: Racial Bias In Survey Researchmentioning
confidence: 99%
“…Health care organizations using such data to inform protocols, program screenings, models, or algorithms risk having inherent bias in generated results. 26 For example, incomplete risk scores used to inform resource allocation (ie, before/during/after disasters) could perpetuate racial disparities rather than eliminate them. 24 Prioritizing the perspectives and contributions of minoritized groups who have been disproportionately harmed by disasters can help address representation bias in the data collection process and facilitate more equitable knowledge construction and survey tools.…”
Section: Racial Bias In Survey Researchmentioning
confidence: 99%
See 1 more Smart Citation
“…The impact of Artificial Intelligence (AI) has been significant across nearly every application domain. However, the quality of the AI models largely depends on the quality of the datasets used to train them [1][2][3][4][5]. Moreover, several past incidents highlight the devastating consequences of using biased and erroneous datasets for training AI models, such as discriminatory treatment of users based on demographic characteristics like gender, age, race, and religion by AI systems [1, [6][7][8][9][10].…”
mentioning
confidence: 99%