Alex Hanna scite author profile

We examine the way race and racial categories are adopted in algorithmic fairness frameworks. Current methodologies fail to adequately account for the socially constructed nature of race, instead adopting a conceptualization of race as a fixed attribute. Treating race as an attribute, rather than a structural, institutional, and relational phenomenon, can serve to minimize the structural aspects of algorithmic unfairness. In this work, we focus on the history of racial categories and turn to critical race theory and sociological work on race and ethnicity to ground conceptualizations of race for fairness research, drawing on lessons from public health, biomedical research, and social survey research. We argue that algorithmic fairness researchers need to take into account the multidimensionality of race, take seriously the processes of conceptualizing and operationalizing race, focus on social processes which produce racial inequality, and consider perspectives of those most affected by sociotechnical systems.

show abstract

Data and its (dis)contents: A survey of dataset development and use in machine learning research

Paullada

et al. 2021

View full text Add to dashboard Cite

Diversity and Inclusion Metrics in Subset Selection

Mitchell

Baker

Moorosi³

et al. 2020

View full text Add to dashboard Cite

The ethical concept of fairness has recently been applied in machine learning (ML) settings to describe a wide range of constraints and objectives. When considering the relevance of ethical concepts to subset selection problems, the concepts of diversity and inclusion are additionally applicable in order to create outputs that account for social power and access differentials. We introduce metrics based on these concepts, which can be applied together, separately, and in tandem with additional fairness constraints. Results from human subject experiments lend support to the proposed criteria. Social choice methods can additionally be leveraged to aggregate and choose preferable sets, and we detail how these may be applied. CCS CONCEPTS• Information systems → Information retrieval diversity; Evaluation of retrieval results. KEYWORDS machine learning fairness, subset selection, diversity and inclusion ACM Reference Format:

show abstract

On the genealogy of machine learning datasets: A critical history of ImageNet

et al. 2021

View full text Add to dashboard Cite

In response to growing concerns of bias, discrimination, and unfairness perpetuated by algorithmic systems, the datasets used to train and evaluate machine learning models have come under increased scrutiny. Many of these examinations have focused on the contents of machine learning datasets, finding glaring underrepresentation of minoritized groups. In contrast, relatively little work has been done to examine the norms, values, and assumptions embedded in these datasets. In this work, we conceptualize machine learning datasets as a type of informational infrastructure, and motivate a genealogy as method in examining the histories and modes of constitution at play in their creation. We present a critical history of ImageNet as an exemplar, utilizing critical discourse analysis of major texts around ImageNet’s creation and impact. We find that assumptions around ImageNet and other large computer vision datasets more generally rely on three themes: the aggregation and accumulation of more data, the computational construction of meaning, and making certain types of data labor invisible. By tracing the discourses that surround this influential benchmark, we contribute to the ongoing development of the standards and norms around data development in machine learning and artificial intelligence research.

show abstract

MPEDS: Automating the Generation of Protest Event Data

Hanna¹

2017

Preprint

View full text Add to dashboard Cite

Large-scale research of social movements has required more detailed, recent, and specific data about protest events. Analyses of these data allow for new insights into movement emergence, consequences, and tactical innovation and adaptation. One of the issues with this kind of analysis, however, is that the generation of event data is incredibly costly. Human coders must pore through news sources, looking for instances of protest and coding many variables by hand. Because of the high labor costs, projects are typically limited to one or two newspapers per country. This, in turn, exacerbates issues of selection and description biases.This article aims to address this issue with the development, validation, and application of a system for automating the generation of protest event data. This system, called the Machine-Learning Protest Event Data System (MPEDS), is the first of its kind coming from within the social movement community. MPEDS uses recent innovations from machine learning and natural language processing to generate protest event data with little to no human intervention. The system aims to have the effect of increasing the speed and reducing the labor costs associated with identifying and coding collective action events in news sources, thus increasing the timeliness of protest data and reducing biases due to excessive reliance on too few news sources. Work on MPEDS is ongoing, and to that end, the system will also be open, available for replication, and extendable by future social movement researchers, and social and computational scientists.

show abstract

Towards Accountability for Machine Learning Datasets

Hutchinson¹,

Smart²,

Hanna³

et al. 2021

132

View full text Add to dashboard Cite

Coproduction or cooptation? Real-time spin and social media response during the 2012 French and US presidential debates

et al. 2016

View full text Add to dashboard Cite

Major political events now unfold in a hybrid political information cycle: even as millions of citizens tune in to television broadcasts, many also comment -and receive others' comments -over social media. In previous research, we have described how biobehavioral cues spur Twitter discussion of candidates during American presidential debates. Here we extend that research to also account for other elements of the communication environment -in particular, messages from political and media elites reaching them via a 'second screen' such as mobile phone or tablet -and we apply our analyses to debates in both the United States and France. Specifically, we examine the relationship between the Twitter posts of 300 politicians, organizations and media figures from each country and the relevant messages of the larger Twitterverse during the debates. Our findings reveal commonalities in social media response in the two countries, particularly the powerful role of party figures and pundits in spurring social media posting. We also note differences between the social media cultures of the two countries, including the finding that French elites commanded relatively more attention (in the form of retweets) than their American counterparts. Implications for debate evaluations and online expression are discussed.

show abstract

AI and the Everything in the Whole Wide World Benchmark

Raji¹,

Bender²,

Paullada³

et al. 2021

Preprint

View full text Add to dashboard Cite

There is a tendency across different subfields in AI to valorize a small collection of influential benchmarks. These benchmarks operate as stand-ins for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress towards these long-term goals. In this position paper, we explore the limits of such benchmarks in order to reveal the construct validity issues in their framing as the functionally "general" broad measures of progress they are set up to be. 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alex Hanna

Towards a critical race methodology in algorithmic fairness

Data and its (dis)contents: A survey of dataset development and use in machine learning research

Diversity and Inclusion Metrics in Subset Selection

On the genealogy of machine learning datasets: A critical history of ImageNet

MPEDS: Automating the Generation of Protest Event Data

Towards Accountability for Machine Learning Datasets

Coproduction or cooptation? Real-time spin and social media response during the 2012 French and US presidential debates

AI and the Everything in the Whole Wide World Benchmark

Contact Info

Product

Resources

About