2011
DOI: 10.1080/15434303.2010.536924
|View full text |Cite
|
Sign up to set email alerts
|

Item Writer Judgments of Item Difficulty Versus Actual Item Difficulty: A Case Study

Abstract: This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners of Russian participated. The results indicated that the hypothesized item difficulty was a weak, although significant, predictor of actual item difficulty. Intermedi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
1

Year Published

2013
2013
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 38 publications
0
8
1
Order By: Relevance
“…In contrast, our result partly refutes some previous studies (Alderson and Lukmani 1989;Sydorenko 2011) that claim that (experienced) item writer intuitions are weak predictor of item difficulty. We propose this difference is the result of the training that the CEFRrelating judges received.…”
Section: Discussioncontrasting
confidence: 57%
See 1 more Smart Citation
“…In contrast, our result partly refutes some previous studies (Alderson and Lukmani 1989;Sydorenko 2011) that claim that (experienced) item writer intuitions are weak predictor of item difficulty. We propose this difference is the result of the training that the CEFRrelating judges received.…”
Section: Discussioncontrasting
confidence: 57%
“…The reason for including the language expert into the study is twofold. First, in our context, most of the language experts participating in the CEFR alignment project are also item-writers for the national examinations, and second we want to address the question of experts and their reported weak ability to predict the item/task difficulty (Alderson and Lukmani 1989;Sydorenko 2011). In addition, the indepth analysis of the test-takers' while-reading questionnaire is employed to identify the underlying factors that can contribute to the item/task difficulty and influence test-taker performance.…”
mentioning
confidence: 99%
“…Instead, difficulty ratings from experts are used as approximations of real item difficulty and as a basis for distributing items within a design. Several studies have reported medium to high correlations between such ratings and real item difficulties (e.g., Bejar, ; Hambleton & Jirka, ; Sydorenko, ; Wauters et al., ). Therefore, we investigated the extent to which misclassification of items in terms of their difficulty could impair the efficiency of the different calibration designs considered in this study.…”
Section: Discussionmentioning
confidence: 99%
“…Several studies have investigated the accuracy with which experts, such as test developers, content experts, or item authors, can rate item difficulty, and they have found moderate to high correlations between the ratings and the empirical item difficulties (e.g., Bejar, ; Hambleton & Jirka, ; Sydorenko, ; Wauters, Desmet, & van den Noortgate, ). The accuracy of difficulty ratings depends on several factors, such as the content, item type, training of the judges, and number of judges (Hambleton & Jirka, , pp.…”
Section: Accuracy and Efficiency In Rasch Model–based Item Calibrationmentioning
confidence: 99%
“…Furthermore, it might be more challenging to create items targeted at higher competence levels and to predict their true difficulty (cf. Sydorenko, 2011). Further studies with a stronger focus on mathematics didactics and competence development are required to evaluate these hypotheses and investigate whether the competence levels described for secondary school fulfill the basic precondition for a unidimensional vertical scale of a continuous increase in the target competence over time (Young, 2006).…”
Section: Main Effectsmentioning
confidence: 99%