Evaluating subscore uses across multiple levels: A case of reading and listening subscores for young EFL learners

Choi, Ikkyu; Papageorgiou, Spiros

doi:10.1177/0265532219879654

Cited by 7 publications

(9 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, TOEFL iBT listening section was composed of 34 listening items measuring three major listening subskills (Lee & Sawaki, 2009) and shortened to 28 listening items in the Shorter TOEFL iBT® Test starting from August 1, 2019. The TOEFL Primary listening section consists of 30 items that assess four communication goals (Choi & Papageorgiou, 2020). It is therefore understandable that previous research on providing subscores has generally produced unsatisfactory results, claiming that subscores are not of adequate quality psychometrically (Papageorgiou & Choi, 2018), and that subscore-based inferences are supported only at group level but not at individual test taker level (Choi & Papageorgiou, 2020).…”

Section: Discussionmentioning

confidence: 99%

“…The TOEFL Primary listening section consists of 30 items that assess four communication goals (Choi & Papageorgiou, 2020). It is therefore understandable that previous research on providing subscores has generally produced unsatisfactory results, claiming that subscores are not of adequate quality psychometrically (Papageorgiou & Choi, 2018), and that subscore-based inferences are supported only at group level but not at individual test taker level (Choi & Papageorgiou, 2020). This study has demonstrated that CDA analyses, by using a smaller number of latent scale points in estimation than item response theory (IRT) analyses (Templin & Bradshaw, 2013), provide acceptable classification reliability and thus the possibility to meet the substantial demand from test users on more detailed information.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Developing individualized feedback for listening assessment: Combining standard setting and cognitive diagnostic assessment approaches

Min

2021

Language Testing

View full text Add to dashboard Cite

In this study, we present the development of individualized feedback for a large-scale listening assessment by combining standard setting and cognitive diagnostic assessment (CDA) approaches. We used the performance data from 3358 students’ item-level responses to a field test of a national EFL test primarily intended for tertiary-level EFL learners. The results showed that proficiency classifications and subskill mastery classifications were generally of acceptable reliability, and the two kinds of classifications were in alignment with each other at individual and group levels. The outcome of the study is a set of descriptors that describe each test taker’s ability to understand certain level of oral texts and his or her cognitive performance. The current study, by illustrating the feasibility of combining standard setting and CDA approaches to produce individualized feedback, contributes to the enhancement of score reporting and addresses the long-standing criticism that large-scale language assessments fail to provide individualized feedback to link assessment with instruction.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Developing individualized feedback for listening assessment: Combining standard setting and cognitive diagnostic assessment approaches

Min

2021

Language Testing

View full text Add to dashboard Cite

show abstract

“…The second implication relates to current scoring practice in EAP assessment. Even if the EAP assessment or language assessment in general measures a multidimensional construct, current scoring practice in operational contexts is confined to unidimensional IRT models, primarily owing to the challenge of interpreting item parameters in MIRT models (Reise et al, 2014) and the complexity of communicating effectively with test stakeholders (Choi & Papageorgiou, 2020). Our study suggested that despite the presence of multidimensionality, fitting a unidimensional IRT model would not bias item parameter estimates for most grade clusters.…”

Section: Discussionmentioning

confidence: 99%

Reading is a multidimensional construct at child-L2-English-literacy onset, but comprises fewer dimensions over time: Evidence from multidimensional IRT analysis

2021

View full text Add to dashboard Cite

This study explored the interplay between content knowledge and reading ability in a large-scale multistage adaptive English for academic purposes (EAP) reading assessment at a range of ability levels across 1–12 graders. The datasets for this study were item-level responses to the reading tests of ACCESS for ELLs Online 2.0. A sample of 10,000 test takers were each time randomly drawn from the test-taking population at five grade clusters without manipulation on proficiency levels, and then with manipulation on proficiency levels. The results indicated that although the bi-factor multidimensional item response theory (MIRT) model fit the data significantly better than the unidimensional two-parameter logistic (2PL) model for Grade 1, no clear evidence can be found regarding the dimensionality of the test for Grades 2–12. However, content knowledge was consistently found to contribute substantially to test performance for low-ability-level test takers across all grade clusters. The findings indicate that EAP reading ability is a multidimensional construct in the onset of EAP reading ability development, but the presence of multidimensionality decreases as proficiency level and grade level increase. This study provides insights into the developmental pattern of the interplay between language and content in EAP reading contexts.

show abstract

“…Listening tests often consist of multiple components targeting different communication goals (Choi and Papageorgiou, 2020). Scores on each component of the listening test, also called listening subscores, may provide added value over the total score.…”

Section: Grading and Awardingmentioning

confidence: 99%

“…Scores on each component of the listening test, also called listening subscores, may provide added value over the total score. To examine the justifiability of reporting subscores at the individual and school levels, Choi and Papageorgiou (2020) explored the reliability and distinctiveness of listening and reading subscores of the TOEFL Primary test. Four listening subscores based on different communication goals were targeted, that is, Monologue, Dialogue, Narrative, and Academic subscores.…”

Section: Grading and Awardingmentioning

confidence: 99%

Assessing Second Language Listening Over the Past Twenty Years: A Review Within the Socio-Cognitive Framework

Jiang

2020

Front. Psychol.

View full text Add to dashboard Cite

The assessment of second language (L2) listening has received much attention. To understand the state-of-the-art research on L2 listening assessment, a total of 87 studies published in 14 peer-reviewed journals and two research report series between 2001 and 2020 were reviewed, using the socio-cognitive framework for developing and validating listening tests proposed by Weir (2005). Thirteen research themes were identified in relation to the six components of the framework, including test-taker characteristics, cognitive validity, context validity, scoring validity, consequential validity, and criterion-related validity. Context validity was the most investigated component, covering three research themes, that is, task setting, linguistic demands (input and output), and speakers. Based on a detailed analysis of the 13 research themes, recommendations for future research in L2 listening assessment were given.

show abstract

Evaluating subscore uses across multiple levels: A case of reading and listening subscores for young EFL learners

Cited by 7 publications

References 50 publications

Developing individualized feedback for listening assessment: Combining standard setting and cognitive diagnostic assessment approaches

Developing individualized feedback for listening assessment: Combining standard setting and cognitive diagnostic assessment approaches

Reading is a multidimensional construct at child-L2-English-literacy onset, but comprises fewer dimensions over time: Evidence from multidimensional IRT analysis

Assessing Second Language Listening Over the Past Twenty Years: A Review Within the Socio-Cognitive Framework

Contact Info

Product

Resources

About