The impact of revisions in the content of the SAT® and changes in the score scale on the predictive validity of the SAT were examined. Predictions of freshman grade‐point average (FGPA) for the entering class of 1994 (who had taken the old SAT) were compared with predictions for the class of 1995 (who had taken the new SAT I: Reasoning Test). The 1995 scores were evaluated both on the original SAT Program scale and on the recentered scale introduced that year. The changes in the test content and recentering of the score scale had virtually no impact on predictive validity. Other analyses indicated that the SAT I predicts FGPA about equally well across different ethnic groups. Correlations were slightly higher for higher levels of parental education and family income, and grades were more predictable for students with intended majors in math/science (mathematics, engineering, and biological or physical sciences) than for students with other intended majors. Correlations of the SAT I and the composite of SAT I scores and high school grade‐point average (HSGPA) with FGPA were generally higher for women than for men, although this pattern was reversed at colleges with very high mean SAT I scores. When a single prediction equation was used for all students, men tended to get lower grades than predicted and women got higher grades than predicted. African‐American and Hispanic/Latino men received lower grades than predicted, but women in these groups performed as predicted by the composite. Both men and women with intended majors in math/science got lower grades than would be predicted by an equation based on scores for all enrolled students.
With the pressing need for accountability in higher education, standardized outcomes assessments have been widely used to evaluate learning and inform policy. However, the critical question on how scores are influenced by students’ motivation has been insufficiently addressed. Using random assignment, we administered a multiple-choice test and an essay across three motivational conditions. Students’ self-report motivation was also collected. Motivation significantly predicted test scores. A substantial performance gap emerged between students in different motivational conditions (effect size as large as .68). Depending on the test format and condition, conclusions about college learning gain (i.e., value added) varied dramatically from substantial gain (d = 0.72) to negative gain (d = −0.23). The findings have significant implications for higher education stakeholders at many levels.
Open-ended counterparts to a set of items from the quantitative section of the Graduate Record Examination (GRE-Q) were developed. Examinees responded to these items by gridding a numerical answer on a machine-readable answer sheet or by typing on a computer. The test section with the special answer sheets was administered at the end of a regular GRE administration. Test forms were spiraled so that random groups received either the grid-in questions or the same questions in a multiple-choice format. In a separate data collection effort, 364paid volunteers who had recently taken the GRE used a computer keyboard to enter answers to the same set of questions. Despite substantial format differences noted for individual items, total scores for the multiple-choice and open-ended tests demonstrated remarkably similar correlational pattems. There were no significant interactions of test format with either gender or ethnicity.Quantitative items presented in an open-ended response format offer at least three major advantages over their multiple-choice counterparts. First, they reduce measurement error by eliminating random guessing. This is particularly valuable in an adaptive testing situation where branching decisions might be made on the basis of responses to one or two items. Second, they eliminate unintended corrective feedback that is inherent with multiple-choice items. If the answer computed by the examinee is not among the answer choices in a multiple-choice format, the examinee knows that an error was made and may try a different strategy to compute the correct answer. However, to the extent that feedback may reduce trivial computational errors, this could be considered a disadvantage of open-ended questions. A third advantage of open-ended quantitative items is that problems cannot be solved by working backwards from the answer choices. For example, an algebra problem such as 2(x + 4) = 38 -x becomes a much simpler arithmetic problem if the examinee can just substitute the possible values ofx given in the answer choices until the correct value is found. Because this last advantage makes test items more like the kinds of problems students must solve in their academic work, this enhances the face validity and should also improve the construct validity of the test.I would like to thank the Research Committee of the Graduate Record Examinations Board for sponsoring this research. Thanks, also, to Fred Fischer for help in modifying items for the grid-in format and creating scoring rules for them; to Jeffery Jenkins, Inga Novatkowski, and Minhwei Wang for computer programming; and to Ming-mei Wang for help in interpreting the LOGIST analyses. 253 BridgernanAlthough there are sound logical grounds for supposing that the cognitive demands of open-ended and multiple-choice quantitative items could be quite different, empirical evidence that the two item types assess distinctive traits is lacking. A recent review by Traub and MacRury (1.990) suggests that there is evidence that some free-response essay tests measure diffe...
This study examined the relationship between scores on the TOEFL Internet-Based Test (TOEFL iBT ® ) and academic performance in higher education, defined here in terms of grade point average (GPA). The academic records for 2594 undergraduate and graduate students were collected from 10 universities in the United States. The data consisted of students' GPA, detailed course information, and admissions-related test scores including TOEFL iBT, GRE, GMAT, and SAT scores. Correlation-based analyses were conducted for subgroups by academic status and disciplines. Expectancy graphs were also used to complement the correlation-based analyses by presenting the predictive validity in terms of individuals in one of the TOEFL iBT score subgroups belonging to one of the GPA subgroups. The predictive validity expressed in terms of correlation did not appear to be strong. Nevertheless, the general pattern shown in the expectancy graphs indicated that students with higher TOEFL iBT scores tended to earn higher GPAs and that the TOEFL iBT provided information about the future academic performance of non-native English speaking students beyond that provided by other admissions tests. These observations led us to conclude that even a small correlation might indicate a meaningful relationship between TOEFL iBT scores and GPA. Limitations and implications are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.