Outcome tests are a popular method for detecting bias in lending, hiring, and policing decisions. These tests operate by comparing the success rate of decisions across groups. For example, if loans made to minority applicants are observed to be repaid more often than loans made to whites, it suggests that only exceptionally qualified minorities are granted loans, indicating discrimination. Outcome tests, however, are known to suffer from the problem of infra-marginality: even absent discrimination, the repayment rates for minority and white loan recipients might differ if the two groups have different risk distributions. Thus, at least in theory, outcome tests can fail to accurately detect discrimination. We develop a new statistical test of discrimination-the threshold test-that mitigates the problem of infra-marginality by jointly estimating decision thresholds and risk distributions. Applying our test to a dataset of 4.5 million police stops in North Carolina, we find that the problem of infra-marginality is more than a theoretical possibility, and can cause the outcome test to yield misleading results in practice.
As technologies to defend against phishing and malware often impose an additional financial and usability cost on users (such as security keys), a question remains as to who should adopt these heightened protections. We measure over 1.2 billion email-based phishing and malware attacks against Gmail users to understand what factors place a person at heightened risk of attack. We find that attack campaigns are typically short-lived and at first glance indiscriminately target users on a global scale. However, by modeling the distribution of targeted users, we find that a person's demographics, location, email usage patterns, and security posture all significantly influence the likelihood of attack. Our findings represent a first step towards empirically identifying the most at-risk users.
No abstract
In a variety of problem domains, it has been observed that the aggregate opinions of groups are often more accurate than those of the constituent individuals, a phenomenon that has been dubbed the “wisdom of the crowd”. However, due to the varying contexts, sample sizes, methodologies, and scope of previous studies, it has been difficult to gauge the extent to which conclusions generalize. To investigate this question, we carried out a large online experiment to systematically evaluate crowd performance on 1,000 questions across 50 topical domains. We further tested the effect of different types of social influence on crowd performance. For example, in one condition, participants could see the cumulative crowd answer before providing their own. In total, we collected more than 500,000 responses from nearly 2,000 participants. We have three main results. First, averaged across all questions, we find that the crowd indeed performs better than the average individual in the crowd—but we also find substantial heterogeneity in performance across questions. Second, we find that crowd performance is generally more consistent than that of individuals; as a result, the crowd does considerably better than individuals when performance is computed on a full set of questions within a domain. Finally, we find that social influence can, in some instances, lead to herding, decreasing crowd performance. Our findings illustrate some of the subtleties of the wisdom-of-crowds phenomenon, and provide insights for the design of social recommendation platforms.
To assess racial disparities in police interactions with the public, we compiled and analyzed a dataset detailing over 60 million state patrol stops conducted in 20 U.S. states between 2011 and 2015. We find that black drivers are stopped more often than white drivers relative to their share of the driving-age population, but that Hispanic drivers are stopped less often than whites. Among stopped drivers-and after controlling for age, gender, time, and locationblacks and Hispanics are more likely to be ticketed, searched, and arrested than white drivers. These disparities may reflect differences in driving behavior, and are not necessarily the result of bias. In the case of search decisions, we explicitly test for discrimination by examining both the rate at which drivers are searched and the likelihood searches turn up contraband. We find evidence that the bar for searching black and Hispanic drivers is lower than for searching whites. Finally, we find that legalizing recreational marijuana in Washington and Colorado reduced the total number of searches and misdemeanors for all race groups, though a race gap still persists. We conclude by offering recommendations for improving data collection, analysis, and reporting by law enforcement agencies. * This work was supported by the John S. and James L. Knight Foundation, and by the Hellman Fellows Fund. EP acknowledges support from a Hertz Fellowship and an NDSEG Fellowship, and SC acknowledges support from the Karr Family Graduate Fellowship. All data and analysis code are available at https://openpolicing.stanford.edu.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.