Quantitative performance ratings are ubiquitous in modern organizations-from businesses to universities-yet there is substantial evidence of bias against women in such ratings. This study examines how gender inequalities in evaluations depend on the design of the tools used to judge merit. Exploiting a quasi-natural experiment at a large North American university, we found that the number of scale points used in faculty teaching evaluations-whether instructors were rated on a scale of 6 versus a scale of 10significantly affected the size of the gender gap in evaluations in the most male-dominated fields. A survey experiment, which presented all participants with an identical lecture transcript but randomly varied instructor gender and the number of scale points, replicated this finding and suggested that the number of scale points affects the extent to which gender stereotypes of brilliance are expressed in quantitative ratings. These results highlight how seemingly minor technical aspects of performance ratings can have a major effect on the evaluation of men and women. Our findings thus contribute to a growing body of work on organizational practices that reduce workplace inequalities and the sociological literature on how rating systems-rather than being neutral instrumentsshape the distribution of rewards in organizations. organizational practices that can reduce workplace inequalities (see Dobbin, Schrage, and Kalev 2015; Kalev, Dobbin, and Kelly 2006; Williams 2014) and by showing how the design of evaluation tools affects gender dynamics in organizations. GENDER INEQUALITIES IN PERFORMANCE EVALUATIONS Performance evaluations are ubiquitous in contemporary organizations (Castilla 2008). Following the scientific turn in management and the emergence of human resources departments as a bureaucratic form, performance evaluations gained popularity as a means to increase efficiency, standardize comparisons between workers, and reduce bias (Dobbin et al. 2015). In the wake of equal opportunity legislation in employment, structured performance evaluations have also become important symbolic tools that organizations use to signal compliance with federal and state anti-discrimination laws (Dobbin 2009; Edelman 2016). Within the broad category of performance evaluations, numeric ratings are among the most common (Murphy and Cleveland 1995). Despite their intended purpose as "objective" measures of worker performance, a substantial body of research shows systematic bias in performance evaluations against particular groups of workers, including women. Through numerous laboratory and field studies, scholars have shown that women tend to receive significantly lower performance ratings than men, even when their behaviors or skill levels are identical (for a review, see Heilman 2001). When assigning holistic assessments of overall worker quality, managers not only hold women to higher standards in terms of both competence and warmth relative to men (Biernat, Tocci, and Williams 2012; Foschi 1996; Lyness and Heilman 2006), b...