2021
DOI: 10.1177/20539517211035955
|View full text |Cite
|
Sign up to set email alerts
|

On the genealogy of machine learning datasets: A critical history of ImageNet

Abstract: In response to growing concerns of bias, discrimination, and unfairness perpetuated by algorithmic systems, the datasets used to train and evaluate machine learning models have come under increased scrutiny. Many of these examinations have focused on the contents of machine learning datasets, finding glaring underrepresentation of minoritized groups. In contrast, relatively little work has been done to examine the norms, values, and assumptions embedded in these datasets. In this work, we conceptualize machine… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 102 publications
(47 citation statements)
references
References 41 publications
0
47
0
Order By: Relevance
“…The methods presented in this article offer an alternative and potentially complementary approach to examining bias, contributing to efforts to localize (Loukissas, 2017), critique (Beaton, 2016), and contest (Denton et al., 2020) datasets. These methods draw on frameworks from the humanities rather than STEM fields for data critique, prompting us to treat datasets as cultural artifacts refracting the social and political contexts of their production as opposed to value-neutral artifacts that become distorted through special interest politics.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The methods presented in this article offer an alternative and potentially complementary approach to examining bias, contributing to efforts to localize (Loukissas, 2017), critique (Beaton, 2016), and contest (Denton et al., 2020) datasets. These methods draw on frameworks from the humanities rather than STEM fields for data critique, prompting us to treat datasets as cultural artifacts refracting the social and political contexts of their production as opposed to value-neutral artifacts that become distorted through special interest politics.…”
Section: Discussionmentioning
confidence: 99%
“…While there are intersecting aims for engaging each of these methods, a distinct aim of a connotative reading is to situate data semantics historically and culturally in order to interpret how implied meanings are derived from data. In this sense, connotative readings advance efforts to document a “genealogy of datasets” (Denton et al., 2020). Sometimes, information pertinent to a connotative reading is written up in thoughtful data documentation.…”
Section: Reading Datasets Beyond the Neutrality Idealmentioning
confidence: 99%
“…Sociologists can theorize these developments by examining how social inequalities are structured, highlighting political economy, capitalism, and colonial relations (Couldry & Mejias, 2019; Dyer‐Witheford et al., 2019; Shestakofsky, 2020). While macro‐level social theories provide analytic tools for global transformations, sociologists can attend to the production of power and knowledge through genealogies (Denton et al., 2021) and ethnographies of AI research (Hoffman, 2021; Jaton, 2021). There will also be continuing value in producing ethnographies (and institutional ethnographies, James & Whelan, 2021) of organizations implementing algorithmic systems (Bailey et al., 2020; Brayne & Christin, 2021; Cruz, 2020; Shestakofsky & Kelkar, 2020), as well as studies into the experiences of people who are further ‘downstream’, interacting with algorithmic systems (Christin, 2020; Noble, 2018).…”
Section: The Future Of Inequality and Sociology's Responsementioning
confidence: 99%
“…However, even ImageNet [9], which was released in 2012 and remains one of the most popular datasets in the computer vision domain to this day [4,46], contains questionable content [3]. The entailed issues have been discussed for language models, for instance, models producing stereotypical and derogatory content [2], and for vision model respectively CV datasets highlighting, e.g., gender and racial biases [10,29,44,48].…”
Section: Issues Arising From Large Datasetsmentioning
confidence: 99%