Our study provides strong evidence that a sexually transmitted infection causes anal cancer. The presence of high-risk types of HPV, notably HPV-16 (which is known to cause cancer of the cervix), in the majority of anal-cancer tissue specimens suggests that most anal cancers are potentially preventable.
The Percentage of Proficient Students (PPS) has become a ubiquitous statistic under the No Child Left Behind Act. This focus on proficiency has statistical and substantive costs. The author demonstrates that the PPS metric offers only limited and unrepresentative depictions of large-scale test score trends, gaps, and gap trends. The limitations are unpredictable, dramatic, and difficult to correct in the absence of other data. Interpretation of these depictions generally leads to incorrect or incomplete inferences about distributional change. The author shows how the statistical shortcomings of these depictions extend to shortcomings of policy, from exclusively encouraging score gains near the proficiency cut score to shortsighted comparisons of state and national testing results. The authorproposes alternatives for large-scale score reporting and argues that a distribution-wide perspective on results is required for any serious analysis of test score data, including "growth"-related results under the recent Growth Model Pilot Program.
Problems of scale typically arise when comparing test score trends, gaps, and gap trends across different tests. To overcome some of these difficulties, test score distributions on the same score scale can be represented by nonparametric graphs or statistics that are invariant under monotone scale transformations. This article motivates and then develops a framework for the comparison of these nonparametric trend, gap, and gap trend representations across tests. The connections between this framework and other nonparametric tools, including probability–probability (PP) plots, the Mann-Whitney U test, and the statistic known as P(Y > X), are highlighted. The author describes the advantages of this framework over scale-dependent trend and gap statistics and demonstrates applications of these nonparametric methods to frequently asked policy questions.
In massive open online courses (MOOCs), low barriers to registration attract large numbers of students with diverse interests and backgrounds, and student use of course content is asynchronous and unconstrained. The authors argue that MOOC data are not only plentiful and different in kind but require reconceptualization-new educational variables or different interpretations of existing variables. The authors illustrate this by demonstrating the inadequacy or insufficiency of conventional interpretations of four variables for quantitative analysis and reporting: enrollment, participation, curriculum, and achievement. Drawing from 230 million clicks from 154,763 registrants for a prototypical MOOC offering in 2012, the authors present new approaches to describing and understanding user behavior in this emerging educational context.
We describe a cheating strategy enabled by the features of massive open online courses (MOOCs) and detectable by virtue of the sophisticated data systems that MOOCs provide. The strategy, Copying Answers using Multiple Existences Online (CAMEO), involves a user who gathers solutions to assessment questions using a "harvester" account and then submits correct answers using a separate "master" account. We use "clickstream" learner data to detect CAMEO use among 1.9 million course participants in 115 MOOCs from two universities. Using conservative thresholds, we estimate CAMEO prevalence at 1,237 certificates, accounting for 1.3% of the certificates in the 69 MOOCs with CAMEO users. Among earners of 20 or more certificates, 25% have used the CAMEO strategy. CAMEO users are more likely to be young, male, and international than other MOOC certificate earners. We identify preventive strategies that can decrease CAMEO rates and show evidence of their effectiveness in science courses.
Linking score scales across different tests is considered speculative and fraught, even at the aggregate level. We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that aggregate linkages can be validated both directly and indirectly under certain conditions such as when the scores for at least some target units (districts) are available on a common test (e.g., the National Assessment of Educational Progress). We introduce precision-adjusted random effects models to estimate linking error, for populations and for subpopulations, for averages and for progress over time. These models allow us to distinguish linking error from sampling variability and illustrate how linking error plays a larger role in aggregates with smaller sample sizes. Assuming that target districts generalize to the full population of districts, we can show that standard errors for district means are generally less than .2 standard deviation units, leading to reliabilities above .7 for roughly 90% of districts. We also show how sources of imprecision and linking error contribute to both within- and between-state district comparisons within versus between states. This approach is applicable whenever the essential counterfactual question—“what would means/variance/progress for the aggregate units be, had students taken the other test?”—can be answered directly for at least some of the units.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.