The majority of large-scale assessments develop various score scales that are either linear or nonlinear transformations of raw scores for better interpretations and uses of assessment results. The current formula for coefficient alpha (α; the commonly used reliability coefficient) only provides internal consistency reliability estimates of raw scores. This article presents a general form of α and extends its use to estimate internal consistency reliability for nonlinear scale scores (used for relative decisions). The article also examines this estimator of reliability using different score scales with real data sets of both dichotomously scored and polytomously scored items. Different score scales show different estimates of reliability. The effects of transformation functions on reliability of different score scales are also explored.
Agreement among observations on two variables for reliability or validation purposes is usually assessed by the evaluation of the mean squared differences (MSD). Many transformations of MSD have been proposed to interpret and make statistical inferences about the agreement between the two variables, including the concordance correlation coefficient (CCC) and the random marginal agreement coefficient (RMAC). This paper presents a normalization of MSD based on a reference range and uses it to derive CCC and RMAC (or ACC alternatively). The normalization of MSD enables the comparison between these two coefficients. The paper compares thoroughly the differences between these two coefficients and their properties at different agreement levels. Results show that ACC has promising properties over CCC. A Monte Carlo simulations as well as real data applications are performed. ACC for more than two variables are also derived.
The current study explores the differences in metacognitive awareness perceptions of students who had high and low scores on TIMSS-like science tests. The sample consisted of 937 Omani students, 478 in Grade Five and 459 in Grade Nine. TIMSS-like tests were specially designed for both grade levels, and students also completed a metacognitive awareness perceptions inventory which explored their use of four main skills: planning, information management strategies, debugging strategies and evaluation. MANOVA was used to analyze the data. The findings indicated that students with high scores in the TIMSS-like test out-performed students with low scores in the test on all four metacognitive skills surveyed. This was true for all three performance areas analysed: performance in the TIMSS-like test as a whole, performance in lower-level test questions and performance in higher-level test questions. These findings highlight the extent to which students’ metacognitive skills influence their performance in science tests. The study recommends that students be trained to improve their metacognitive skills, reviews several methods for doing this, and suggests that such training might better prepare them for taking science tests. However, it also notes that further research is needed to explore the impact of metacognitive training on student performance in specific science examinations such as TIMSS.
The study aimed to investigate the differential item functioning (DIF) for verbal ability test items in the Gulf Multiple Mental Ability Scale for female students in general and Omani Female students in particular in using Mantel-Haenszel (MH) and the Transformed Item Difficulty (TID) methods. The test consisted of 30 multiple-choice items with four distactors. The study sample consisted of the archive data for 4280 students of the third and fourth grades in GCC countries. The results revealed that Sixty Pecent of the items showed DIF related to gender using MH. Similarly, Sixty Pecent of the items showed DIF related to country using MH. DIF values were small indicating weak DIF in most items. Results also indicated that DIF using TID was found for Thirty percent of the items related to gender, and 33.33% related to country. Furthermore, Kappa coefficient 0.524 was moderate between MH method and TID for gender and the agreement ratio was 70%. Kappa coefficient 0.158 was weak between MH method and TID for country and the agreement ratio was 46.67%. Based on the study results the reseachers recommend invistigating the reasons behind the detected differential functioning of some items in the verbal ability test at the second level of the GMMAS scale to avoid and address it.
In the Arab region, several assessments are available to evaluate student skills in mathematical computations. However, none of them uses formative evaluation to guide universal screening of struggling learners or students with learning disability (LD). The current study aimed to develop mathematical computation curriculum-based measurement (MC-CBM) for Arab speaking fourth grade students, examine its psychometric properties, test its adequacy for use in an Arab context, namely Oman, determine an adequate time for its administration, and develop performance benchmarks. MC-CBM were administered to 528 fourth grade students. Results indicated that the developed measures were adequate for use in the Arab context. Received operation characteristic (ROC) curve indicated good specificity and sensitivity estimates for the MC-CBM. Performance benchmarks were obtained using the 25th and 75th percentiles. Implications are discussed from a contextual perspective.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.