Four writing samples were obtained from 638 applicants for admission to U.S. institutions as undergraduates or as graduate students in business, engineering, or social science. The applicants represented three major foreign language groups (Arabic, Chinese, and Spanish), plus a small sample of native English speakers. Two of the writing topics were of the compare and contrast type and the other two involved chart and graph interpretation. The writing samples were scored by 23 readers who are English as a second language specialists and 23 readers who are English writing experts. Each of the four writing samples was scored holistically, and during a separate rating session two of the samples from each student were assigned separate scores for sentence‐level and discourse‐level skills. Representative subsamples of the papers also were scored descriptively with the Writer's Workbench computer program and by graduate‐level subject matter professors in engineering and the social sciences. In addition to the writing sample scores, TOEFL scores were obtained for all students in the foreign sample. GRE General Test scores were obtained for students in the U.S. sample and for a subsample of students in the foreign sample. Students in the U.S. sample also took a multiple‐choice measure of writing ability. Among the key findings were the following: 1) holistic scores, discourse‐level scores, and sentence‐level scores were so closely related that the holistic score alone should be sufficient; 2) correlations among topics were as high across topic types as within topic types; 3) scores of ESL raters, English raters, and subject matter raters were all highly correlated, suggesting substantial agreement in the standards used; correlations and factor analyses indicated that scores on the writing samples and TOEFL were highly related, but that each also was reliably measuring some aspect of English language proficiency that was not assessed by the other; and (5) correlations of holistic writing sample scores with scores on item types within the sections of the GRE General Test yielded a pattern of relationships that was consistent with the relationships reported in other GRE studies.
An earlier investigation (Bejar, 1983) had argued that experts' judgment of item difficulty could perhaps be usefully supplemented with linguistic information about the sentence from which the item was derived. To investigate that idea, we analyzed items from the earlier study to determine their syntactic structure. Three potential independent variables were studied by themselves and in conjunction with subject‐matter ratings. The analysis suggested that the combination of experts' judgments and syntactic information about the sentence on which the item was based collectively predicted difficulty better than either judgment or syntactic information alone. Moreover, the proportions of variance in item difficulty accounted for by the judgments and syntactic information was 31%.
Educational Testing Service is currently developing a new generation of teacher assessments‐The Praxis Series: Professional Assessments for Beginning Teachers™. The assessment series consist of three separate, but related, components. Praxis I: Academic Skills Assessments will assess the candidate's command of basic academic or enabling skills in reading, writing, and mathematics. Praxis II: Subject Assessments will test the candidate's grasp of subject matter and his or her knowledge of the teaching and learning process. Praxis III: Classroom Performance Assessments will assess the candidate's application of this knowledge in an actual classroom setting. This document describes a series of formative studies that were conducted in support of the development of Praxis III. The research efforts were targeted in three broad areas: (a) field‐testing of the various data‐collection instruments; (b) examination of the processes and strategies involved in retrieving, coding, and evaluating teacher performance data; and (c) analysis of how the performance assessment addresses issues of diversity in teaching and learning. The overarching goal of these studies was to identify strengths of the performance assessment system as well as aspects that needed further refinement. The studies were conducted in Minneapolis, Minnesota and Dover, Delaware during November and December 1991. Trained assessors working in pairs carried out an assessment cycle in which they observed a candidate teaching a lesson, interviewed the candidate before and after the observation, and reviewed several documents the candidate had completed. A total of 18 candidates were evaluated. The assessors took notes during the interviews and observations, and then coded them. From their coded notes, assessors selected pieces of evidence to include on a Record‐of‐Evidence form, a document that summarizes the evidence the assessor obtained for 21 criteria of good teaching and provides a rationale for each rating. Assessors weighed the evidence they obtained for each criterion and used a scoring rule to assign a rating on the criterion. When the assessors had finished rating candidates, they met as a group to evaluate the assessment system. The assessors completed questionnaires and work sheets and engaged in small‐ and large‐group discussions to share their reactions to using the assessment system. These activities and the records of evidence provided the data for the formative studies. This overview document highlights the major findings from each of the formative studies and discusses the implications of those findings for the Praxis III assessment system. The last section of the paper describes how the developers used the results of the formative evaluation to guide them in making a number of informed changes in Praxis III, that is, in revising the domain descriptions, criterion descriptions, and accompanying scoring rules. Changes were made in the data‐collection instruments and in the assessor training program, and new procedural guidelines for carr...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.