The wide use of multiple-choice (MC) items across different evaluation contexts highlights the importance of their correct development and usage. With the objective of enhancing the validity of scores obtained from MC items, fundamental guidelines for MC item construction have been developed by different authors. Haladyna and Downing (1989a) settled the basis for MC item-writing by analyzing 46 textbooks and other sources, and proposed 43 consensual guidelines. The same authors also reviewed more than 90 studies to explore the validity of their recommendations and found that more than half of the guidelines had not been investigated at all (Haladyna & Downing, 1989b). In a replication of the latter review, Haladyna, Downing, and Rodriguez (2002) validated and reduced the original taxonomy of 43 item-writing rules to 31 guidelines, which have recently been reorganized and updated (Haladyna & Rodriguez, 2013). Other taxonomies for developing MC items were developed by Frey, Petersen, Edwards, Teramoto Pedrotti, and Peyton (2005) and Moreno, Martínez, and Muñiz (2006), which basically comprised the same advice as Haladyna et al.'s (2002). The latest pieces of advice for A B S T R A C T Multiple-choice items are extensively used across different assessment contexts. A crucial requirement for ensuring their validity is their correct development, and a number of item-writing guidelines have been proposed that support item developers. This experimental pilot study aimed to investigate the effect of violating two item-writing guidelines: the differential length of the correct option compared to distractors and its lexical overlap with the stem. Standard and flawed items, respectively adhering to and deviating from guidelines, were randomly assigned to 55 college students and compared in their psychometric functioning. Results indicated that, in general, flawed items tended to be easier and less subject to random answers than standard ones, but significant differences were few. Discrepancies between standard and flawed subtests approached statistical significance with medium effect sizes. Although of interest, findings must be cautiously interpreted due to the small sample size. Implications for future research are discussed.
La longitud diferencial y el solapamiento con el enunciado en las opciones de ítems de opción múltiple: un experimento pilotoR E S U M E N Los ítems de opción múltiple son ampliamente utilizados en contextos de evaluación muy variados. Un requisito muy importante para garantizar su validez es su correcta redacción, y para ayudar a conseguirlo se han desarrollado una serie de directrices. El objetivo de este estudio piloto experimental fue investigar el efecto del incumplimiento de dos de estas reglas, más concretamente, la longitud diferencial de la opción correcta comparada con los distractores y su solapamiento léxico con el enunciado. Para ello, se asignó aleatoriamente a 55 estudiantes a las condiciones de responder a ítems que respetaban o que incumplían las mencionadas directrices y se comparar...