Research has often found that, when high school grades and SAT scores are used to predict first‐year college grade‐point average (FGPA) via regression analysis, African‐American and Latino students, are, on average, predicted to earn higher FGPAs than they actually do. Under various plausible models, this phenomenon can be explained in terms of the unreliability of predictor variables. Attributing overprediction to measurement error, however, is not fully satisfactory: Might the measurement errors in the predictor variables be systematic in part, and could they be reduced? The research hypothesis in the current study was that the overprediction of Latino and African‐American performance occurs, at least in part, because these students are more likely than White students to attend high schools with fewer resources. The study provided some support for this hypothesis and showed that the prediction of college grades can be improved using information about high school socioeconomic status. An interesting peripheral finding was that grades provided by students’ high schools were stronger predictors of FGPA than were students’ self‐reported high school grades. Correlations between the two types of high school grades (computed for each of 18 colleges) ranged from .59 to .85.
Objective: This article presents health science educators and researchers with an overview of standardized testing in educational measurement. The history, theoretical frameworks of classical test theory, item response theory (IRT), and the most common IRT models used in modern testing are presented. Methods: A narrative overview of the history, theoretical concepts, test theory, and IRT is provided to familiarize the reader with these concepts of modern testing. Examples of data analyses using different models are shown using 2 simulated data sets. One set consisted of a sample of 2000 item responses to 40 multiple-choice, dichotomously scored items. This set was used to fit 1-parameter logistic (PL) model, 2PL, and 3PL IRT models. Another data set was a sample of 1500 item responses to 10 polytomously scored items. The second data set was used to fit a graded response model. Results: Model-based item parameter estimates for 1PL, 2PL, 3PL, and graded response are presented, evaluated, and explained. Conclusion: This study provides health science educators and education researchers with an introduction to educational measurement. The history of standardized testing, the frameworks of classical test theory and IRT, and the logic of scaling and equating are presented. This introductory article will aid readers in understanding these concepts.
Cross-cultural research on children's theory of mind (ToM) understanding has raised questions about its developmental sequence and relationship with executive function (EF). The current study examined how ToM develops (using the tasks from Wellman & Liu, 2004) in relation to 2 EF skills (conflict inhibition, working memory) in 997 Chinese preschoolers (ages 3, 4, 5) in Chengdu, China. Compared with prior research with other Chinese and non-Chinese children, some general patterns in development were replicated in this sample. However, the children showed culture-specific reversals in the developmental sequence of ToM. For example, Chengdu children performed differently on the 2 false-belief tasks that were thought to be equivalent. Furthermore, conflict inhibition as well as working memory uniquely predicted ToM performance. We discuss the issues of ToM development as they relate to test items and cross-cultural--and subcultural--differences.
Objective: The objectives of this study were to (1) identify factors predictive of performance on the National Board of Chiropractic Examiners Part IV exam and (2) investigate correlations between the scores obtained in the Part I, Part II, Physiotherapy, and Part III exams and the Part IV examination. Methods: A random sample of 1341 records was drawn from National Board of Chiropractic Examiners data to investigate the relationships between the scores obtained on the National Board of Chiropractic Examiners exams. A hierarchical multiple regression analysis related the performance on Part IV to examinee's gender, Part IV repeater status, and scores obtained on the Part I, Part II, Physiotherapy, and Part III exams. Results: The analyses revealed statistical relations among all National Board of Chiropractic Examiners exams. The correlations between Part IV and Part I ranged from r = .31 to r = .4; between Part IV and Part II from r = .34 to r = .45. The correlation between Part IV and Physiotherapy was r = .44; between Part IV and Part III was r = .46. The strongest predictors of the Part IV score were found to be examinees' scores in Diagnostic Imaging, β̂ = .19, p < .001; Chiropractic Practice, β̂ = .17, p < .001; Physiotherapy, β̂ = .15, p < .001; and the Part III exam β̂ = .19, p < .001. Conclusions: Performance on the National Board of Chiropractic Examiners Part IV examination is related to the performance in all other National Board of Chiropractic Examiners exams.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.