The validity of studies investigating interventions to enhance fluid intelligence (Gf) depends on the adequacy of the Gf measures administered. Such studies have yielded mixed results, with a suggestion that Gf measurement issues may be partly responsible. The purpose of this study was to develop a Gf test battery comprising tests meeting the following criteria: (a) strong construct validity evidence, based on prior research; (b) reliable and sensitive to change; (c) varying in item types and content; (d) producing parallel tests, so that pretest-posttest comparisons could be made; (e) appropriate time limits; (f) unidimensional, to facilitate interpretation; and (g) appropriate in difficulty for a high-ability population, to detect change. A battery comprising letter, number, and figure series and figural matrix item types was developed and evaluated in three large-N studies (N = 3,067, 2,511, and 801, respectively). Items were generated algorithmically on the basis of proven item models from the literature, to achieve high reliability at the targeted difficulty levels. An item response theory approach was used to calibrate the items in the first two studies and to establish conditional reliability targets for the tests and the battery. On the basis of those calibrations, fixed parallel forms were assembled for the third study, using linear programming methods. Analyses showed that the tests and test battery achieved the proposed criteria. We suggest that the battery as constructed is a promising tool for measuring the effectiveness of cognitive enhancement interventions, and that its algorithmic item construction enables tailoring the battery to different difficulty targets, for even wider applications. Keywords Intelligence. Fluid ability. Gf. Working memory training. Reasoning. Item-response theory. Test assembly General fluid ability (Gf) is Bat the core of what is normally meant by intelligence^(Carroll, 1993, p. 196), and has been shown empirically to be synonymous with general cognitive ability (g), at least within groups with roughly comparable opportunities to learn (Valentin Kvist & Gustafsson, 2008). Gf has been viewed as an essential determinant of one's ability to solve a wide range of novel real-world problems (Schneider & McGrew, 2012). Perhaps because of its association with diverse outcomes, there has been a longstanding interest in improving Gf (i.e., intelligence) through general schooling