On the genealogy of machine learning datasets: A critical history of ImageNet

Denton, Emily; Hanna, Alex; Amironesei, Razvan; Smart, Andrew; Nicole, Hilary

doi:10.1177/20539517211035955

Cited by 102 publications

(47 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The methods presented in this article offer an alternative and potentially complementary approach to examining bias, contributing to efforts to localize (Loukissas, 2017), critique (Beaton, 2016), and contest (Denton et al., 2020) datasets. These methods draw on frameworks from the humanities rather than STEM fields for data critique, prompting us to treat datasets as cultural artifacts refracting the social and political contexts of their production as opposed to value-neutral artifacts that become distorted through special interest politics.…”

Section: Discussionmentioning

confidence: 99%

“…While there are intersecting aims for engaging each of these methods, a distinct aim of a connotative reading is to situate data semantics historically and culturally in order to interpret how implied meanings are derived from data. In this sense, connotative readings advance efforts to document a “genealogy of datasets” (Denton et al., 2020). Sometimes, information pertinent to a connotative reading is written up in thoughtful data documentation.…”

Section: Reading Datasets Beyond the Neutrality Idealmentioning

confidence: 99%

See 1 more Smart Citation

Reading datasets: Strategies for interpreting the politics of data signification

Poirier

2021

Big Data & Society

View full text Add to dashboard Cite

All datasets emerge from and are enmeshed in power-laden semiotic systems. While emerging data ethics curriculum is supporting data science students in identifying data biases and their consequences, critical attention to the cultural histories and vested interests animating data semantics is needed to elucidate the assumptions and political commitments on which data rest, along with the externalities they produce. In this article, I introduce three modes of reading that can be engaged when studying datasets—a denotative reading (extrapolating the literal meaning of values in a dataset), a connotative reading (tracing the socio-political provenance of data semantics), and a deconstructive reading (seeking what gets Othered through data semantics and structure). I then outline how I have taught students to engage these methods when analyzing three datasets in Data and Society—a course designed to cultivate student competency in politically aware data analysis and interpretation. I show how combined, the reading strategies prompt students to grapple with the double binds of perceiving contemporary problems through systems of representation that are always situated, incomplete, and inflected with diverse politics. While I introduce these methods in the context of teaching, I argue that the methods are integral to any data practice in the conclusion.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Reading Datasets Beyond the Neutrality Idealmentioning

confidence: 99%

Reading datasets: Strategies for interpreting the politics of data signification

Poirier

2021

Big Data & Society

View full text Add to dashboard Cite

show abstract

“…Sociologists can theorize these developments by examining how social inequalities are structured, highlighting political economy, capitalism, and colonial relations (Couldry & Mejias, 2019; Dyer‐Witheford et al., 2019; Shestakofsky, 2020). While macro‐level social theories provide analytic tools for global transformations, sociologists can attend to the production of power and knowledge through genealogies (Denton et al., 2021) and ethnographies of AI research (Hoffman, 2021; Jaton, 2021). There will also be continuing value in producing ethnographies (and institutional ethnographies, James & Whelan, 2021) of organizations implementing algorithmic systems (Bailey et al., 2020; Brayne & Christin, 2021; Cruz, 2020; Shestakofsky & Kelkar, 2020), as well as studies into the experiences of people who are further ‘downstream’, interacting with algorithmic systems (Christin, 2020; Noble, 2018).…”

Section: The Future Of Inequality and Sociology's Responsementioning

confidence: 99%

Artificial intelligence, algorithms, and social inequality: Sociological contributions to contemporary debates

Zajko

2022

Sociology Compass

View full text Add to dashboard Cite

Artificial intelligence (AI) and algorithmic systems have been criticized for perpetuating bias, unjust discrimination, and contributing to inequality. Artificial intelligence researchers have remained largely oblivious to existing scholarship on social inequality, but a growing number of sociologists are now addressing the social transformations brought about by AI. Where bias is typically presented as an undesirable characteristic that can be removed from AI systems, engaging with social inequality scholarship leads us to consider how these technologies reproduce existing hierarchies and the positive visions we can work towards. I argue that sociologists can help assert agency over new technologies through three kinds of actions: (1) critique and the politics of refusal;(2) fighting inequality through technology; and (3) governance of algorithms. As we become increasingly dependent on AI and automated systems, the dangers of further entrenching or amplifying social inequalities have been well documented, particularly with the growing adoption of these systems by government agencies. However, public policy also presents some opportunities to restructure social dynamics in a positive direction, as long as we can articulate what we are trying to achieve, and are aware of the risks and

show abstract

“…However, even ImageNet [9], which was released in 2012 and remains one of the most popular datasets in the computer vision domain to this day [4,46], contains questionable content [3]. The entailed issues have been discussed for language models, for instance, models producing stereotypical and derogatory content [2], and for vision model respectively CV datasets highlighting, e.g., gender and racial biases [10,29,44,48].…”

Section: Issues Arising From Large Datasetsmentioning

confidence: 99%

Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?

Schramowski¹,

Tauchmann²,

Kersting³

2022

Preprint

View full text Add to dashboard Cite

This paper contains images and descriptions that are offensive in nature.Large datasets underlying much of current machine learning raise serious issues concerning inappropriate content such as offensive, insulting, threatening, or might otherwise cause anxiety. This calls for increased dataset documentation, e.g., using datasheets. They, among other topics, encourage to reflect on the composition of the datasets. So far, this documentation, however, is done manually and therefore can be tedious and error-prone, especially for large image datasets. Here we ask the arguably "circular" question of whether a machine can help us reflect on inappropriate content, answering Question 16 in Datasheets. To this end, we propose to use the information stored in pre-trained transformer models to assist us in the documentation process. Specifically, prompt-tuning based on a dataset of socio-moral values steers CLIP to identify potentially inappropriate content, therefore reducing human labor. We then document the inappropriate images found using word clouds, based on captions generated using a vision-language model. The documentations of two popular, large-scale computer vision datasets-ImageNet and OpenImages-produced this way suggest that machines can indeed help dataset creators to answer Question 16 on inappropriate image content.

show abstract

On the genealogy of machine learning datasets: A critical history of ImageNet

Cited by 102 publications

References 41 publications

Reading datasets: Strategies for interpreting the politics of data signification

Reading datasets: Strategies for interpreting the politics of data signification

Artificial intelligence, algorithms, and social inequality: Sociological contributions to contemporary debates

Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?

Contact Info

Product

Resources

About