Stumping e-rater:challenging the validity of automated essay scoring

Powers, Donald E.; Burstein, Jill; Chodorow, Martin; Fowles, Mary E.; Kukich, Karen

doi:10.1016/s0747-5632(01)00052-8

Cited by 93 publications

(78 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Only one published study addressed the defensive argument directly. In Powers, et al (2001) students and teachers were asked to write "bad faith" essays deliberately in order to fool the e-rater system into assigning a higher (and lower) score than they deserved. Surely more such studies are needed in order to delineate more clearly the limitations of AES systems.…”

Section: J·t·l·amentioning

confidence: 99%

Automated Essay Scoring With E‐rater® V.2.0

Attali

Burstein

2004

ETS Research Report Series

399

545

View full text Add to dashboard Cite

Abstract:E-rater® has been used by the Educational Testing Service for automated essay scoring since 1999. This paper describes a new version of e-rater (V.2) that is different from other automated essay scoring systems in several important respects. The main innovations of e-rater V.2 are a small, intuitive, and meaningful set of features used for scoring; a single scoring model and standards can be used across all prompts of an assessment; modeling procedures that are transparent and flexible, and can be based entirely on expert judgment. The paper describes this new system and presents evidence on the validity and reliability of its scores.

show abstract

Section: J·t·l·amentioning

confidence: 99%

Automated Essay Scoring With E‐rater® V.2.0

Attali

Burstein

2004

ETS Research Report Series

399

545

View full text Add to dashboard Cite

show abstract

“…Wang and Brown, 2007), and validating computerised scoring systems (e.g. Powers et al, 2001), and claims have been made that the reliability of such applications in assessing writing matches that of human raters (e.g. Dikli, 2006).…”

Section: Introductionmentioning

confidence: 99%

The impact of computer-based feedback on students’ written work

El-Ebyary¹,

Windeatt²

2010

ijes

View full text Add to dashboard Cite

While research in second language writing suggests that instructor feedback can have a positive influence on students' written work, the provision of such feedback on a regular basis can be problematic, especially with larger student numbers. A number of computer programs that claim to provide both automatic computer-based holistic scores and computer-based feedback (CBF) on written work are available and therefore have the potential to deal with this issue. Criterion is one such tool that claims to be able to provide automated feedback at word, sentence, paragraph and text level, but there is still a need for more research into the practical value of providing feedback on L2 writing. Quantitative and qualitative data about feedback practice was collected from 31 instructors and 549 Egyptian trainee EFL teachers using pre-treatment questionnaires, interviews and focus groups. 24 of the trainees then received computer-based feedback using Criterion on two drafts of essays submitted on each of 4 topics. Data recorded by the software suggested a positive effect on the quality of students' second drafts and subsequent submissions, and post-treatment questionnaires, interviews and focus groups showed a positive effect on the students' attitudes towards feedback. KEYWORDS:computer-based feedback, Criterion, attitudes and motivation RESUMEN A pesar de que la investigación sobre escritura en segundas lenguas sugiere que los comentarios de los profesores pueden tener una influencia positiva sobre el trabajo escrito de los estudiantes, el proporcionar con regularidad tales comentarios puede ser problemático, especialmente en clases muy numerosas. Sin embargo, existen en el mercado una serie de programas informáticos que garantizan poder proporcionar tanto evaluaciones integrales de carácter automático como comentarios informatizados sobre trabajos escritos y que, por lo tanto, tienen cierto potencial para tratar este problema. Criterion es una de estas herramientas y, como tal, proporciona información automatizada a nivel de palabra, oración, párrafo y texto. En el presente trabajo analizamos el valor práctico que ofrece en la producción de comentarios para la escritura en L2 y, a este respecto, recogimos datos cuantitativos y cualitativos de 31 instructores y 616 profesores en formación de inglés como lengua extranjera de origen egipcio por medio de cuestionarios previos, entrevistas y discusiones en grupo. 24 de los profesores recibieron comentarios informatizados producidos por medio de Criterion sobre dos borradores de redacciones realizadas acerca de 4 temas diferentes. La información registrada en el software indica un efecto positivo sobre la calidad de los segundos borradores realizados por los estudiantes, así como de escritos posteriores. Asimismo, tanto los cuestionarios administrados después de la aplicación, como las entrevistas y las discusiones en grupo revelan un efecto positivo sobre la actitud de los estudiantes hacia los comentarios de los profesor

show abstract

“…E-Rater [14] is one of the first automated systems to evaluate text difficulty based on three general classes of essay features: structure (e.g., sentence syntax, proportion of spelling, grammar, usage or mechanics errors), organization based on various discourse features, and content based on prompt-specific vocabulary. Several other tools for automated essay grading or for assessing the textual complexity of a given text have been developed and employed in various educational programs [5,15]: Lexile (MetaMetrics), ATOS (Renaissance Learning), Degrees of Reading Power: DRP Analyzer (Questar Assessment, Inc.), REAP (Carnegie Mellon University), SourceRater (Educational Testing Service), Coh-Metrix (University of Memphis), Markit (Curtin University of Technology) [16], IntelliMetric [17] or Writing Pal (Arizona State University) [18,19].…”

Section: State Of the Artmentioning

confidence: 99%

ReaderBench Learns Dutch: Building a Comprehensive Automated Essay Scoring System for Dutch Language

Dascălu

Westera

Ruşeţi

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Automated Essay Scoring has gained a wider applicability and usage with the integration of advanced Natural Language Processing techniques which enabled in-depth analyses of discourse in order capture the specificities of written texts. In this paper, we introduce a novel Automatic Essay Scoring method for Dutch language, built within the Readerbench framework, which encompasses a wide range of textual complexity indices, as well as an automated segmentation approach. Our method was evaluated on a corpus of 173 technical reports automatically split into sections and subsections, thus forming a hierarchical structure on which textual complexity indices were subsequently applied. The stepwise regression model explained 30.5% of the variance in students' scores, while a Discriminant Function Analysis predicted with substantial accuracy (75.1%) whether they are high or low performance students.

show abstract

Stumping e-rater:challenging the validity of automated essay scoring

Cited by 93 publications

References 9 publications

Automated Essay Scoring With E‐rater® V.2.0

Automated Essay Scoring With E‐rater® V.2.0

The impact of computer-based feedback on students’ written work

ReaderBench Learns Dutch: Building a Comprehensive Automated Essay Scoring System for Dutch Language

Contact Info

Product

Resources

About