1998
DOI: 10.1111/j.1745-3984.1998.tb00534.x
|View full text |Cite
|
Sign up to set email alerts
|

Moderating Possibly Irrelevant Multiple Mean Score Differences on a Test of Mathematical Reasoning

Abstract: A pool of items from operational tests of mathematical reasoning was constructed to investigate the feasibility of using automated test assembly (ATA) methods to simultaneously moderate possibly irrelevant differences between the performance of women and men, and African American and White test takers. None of the artificial tests exhibited substantial impact moderation, although the estimated mean scaled score differences for the relevant population indicated a modest move in the intended direction: the diffe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

1998
1998
2009
2009

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 20 publications
0
12
0
Order By: Relevance
“…Stocking et al (1998a), estimated the same result for their moderated impact tests. In addition, a larger average standard error of measurement can be expected for tests designed to have moderated impact when compared to tests assembled without regard to impact.…”
Section: Reliability Sem and (Concurrent) Validitymentioning
confidence: 53%
See 3 more Smart Citations
“…Stocking et al (1998a), estimated the same result for their moderated impact tests. In addition, a larger average standard error of measurement can be expected for tests designed to have moderated impact when compared to tests assembled without regard to impact.…”
Section: Reliability Sem and (Concurrent) Validitymentioning
confidence: 53%
“…The first method, called "test construction" (TC), uses the WDM directly to simultaneously satisfy all statistical and nonstatistical constraints on item selection, including the moderation of the three different kinds of impact of interest. This is the same approach that was used in the previous study by Stocking et al (1998a). Two versions were tried, one in which a small moderation in the three kinds of impact was the goal (TC-S), and a second in which a larger moderation in the three kinds of impact was the goal (TC-L).…”
Section: Impactmentioning
confidence: 99%
See 2 more Smart Citations
“…88-91). Similarly, Stocking, Jirele, Lewis, and Swanson (1998) have demonstrated that different sets of items, all of which meet the same detailed set of test specifications, can produce variations in the size of group differences.…”
Section: Group Difierencesmentioning
confidence: 94%