Samira Samadi scite author profile

Vempala

2021

We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark data sets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have balanced costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.

Finding Meaningful Cluster Structure Amidst Background Noise

Kushagra

Ben-David

2016

Pairwise Fairness for Ordinal Regression

Kleindeßner¹,

Samadi²,

Zafar³

et al. 2021

Preprint

We initiate the study of fairness for ordinal regression, or ordinal classification. We adapt two fairness notions previously considered in fair ranking and propose a strategy for training a predictor that is approximately fair according to either notion. Our predictor consists of a threshold model, composed of a scoring function and a set of thresholds, and our strategy is based on a reduction to fair binary classification for learning the scoring function and local search for choosing the thresholds. We can control the extent to which we care about the accuracy vs the fairness of the predictor via a parameter. In extensive experiments we show that our strategy allows us to effectively explore the accuracy-vs-fairness trade-off and that it often compares favorably to "unfair" state-of-the-art methods for ordinal regression in that it yields predictors that are only slightly less accurate, but significantly more fair. IntroductionAs machine learning (ML) algorithms have become an integral part of numerous human-centric domains, they have been observed showing a range of concerning behaviors: facial recognition systems having higher accuracy on white male faces than on darker-skinned or female ones (Buolamwini and Gebru, 2017); criminal recidivism tools mislabeling black low-risk defendants as high-risk and white high-risk defendants as low-risk (Angwin et al., 2016); word2vec embeddings encoding stereotypes such as "father is to a doctor as a mother is to a nurse" (Bolukbasi et al., 2016); and image search systems answering the query "CEO" with a much higher fraction of images of men compared to the real-world fraction of male CEOs (Kay et al., 2015), to name just the most prominent examples. These observations have led to the study of fairness in ML (Barocas et al., 2019), and in the past years numerous ML tasks have been studied from a fairness perspective. While most works consider (binary) classification (e.g., Hardt et al., 2016), fair algorithms have also been developed for regression (e.g.,

Multi-Criteria Dimensionality Reduction with Applications to Fairness

Tantipongpipat¹,

Samadi²,

Singh³

et al. 2019

Preprint

Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the multicriteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as the Fair-PCA problem introduced by Samadi et al. [2018] and the Nash Social Welfare (NSW) problem. In the Fair-PCA problem, the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum reconstruction error of any one group is minimized. In NSW the goal is to maximize the product of the individual variances of the groups achieved by the common low-dimensional space. Our main result is an exact polynomial-time algorithm for the two-criterion dimensionality reduction problem when the two criteria are increasing concave functions. As an application of this result, we obtain a polynomial time algorithm for Fair-PCA for k = 2 groups, resolving an open problem of Samadi et al. [2018], and a polynomial time algorithm for NSW objective for k = 2 groups. We also give approximation algorithms for k > 2. Our technical contribution in the above results is to prove new low-rank properties of extreme point solutions to semi-definite programs. We conclude with experiments indicating the effectiveness of algorithms based on extreme point solutions of semi-definite programs on several real-world datasets.

Usability of Humanly Computable Passwords

Vempala

Kalai

2018

HCOMP

Reusing passwords across multiple websites is a common practice that compromises security. Recently, Blum and Vempala have proposed password strategies to help people calculate, in their heads, passwords for different sites without dependence on third-party tools or external devices. Thus far, the security and efficiency of these "mental algorithms" has been analyzed only theoretically. But are such methods usable? We present the first usability study of humanly computable password strategies, involving a learning phase (to learn a password strategy), then a rehearsal phase (to login to a few websites), and multiple follow-up tests. In our user study, with training, participants were able to calculate a deterministic eight-character password for an arbitrary new website in under 20 seconds.

Usability of Humanly Computable Passwords

Samadi¹,

Vempala²,

Kalai³

2017

Preprint

Socially Fair k-Means Clustering

Ghadiri¹,

Samadi²,

Vempala³

2020

Preprint

Near-optimal Herding

Harvey

2014