Across all sciences, the quality of measurements is important. Survey measurements are only appropriate for use when researchers have validity evidence within their particular context. Yet, this step is frequently skipped or is not reported in educational research. This article briefly reviews the aspects of validity that researchers should consider when using surveys. It then focuses on factor analysis, a statistical method that can be used to collect an important type of validity evidence. Factor analysis helps researchers explore or confirm the relationships between survey items and identify the total number of dimensions represented on the survey. The essential steps to conduct and interpret a factor analysis are described. This use of factor analysis is illustrated throughout by a validation of Diekman and colleagues’ goal endorsement instrument for use with first-year undergraduate science, technology, engineering, and mathematics students. We provide example data, annotated code, and output for analyses in R, an open-source programming language and software environment for statistical computing. For education researchers using surveys, understanding the theoretical and statistical underpinnings of survey validity is fundamental for implementing rigorous education research.
This paper presents the development and validation of the Laboratory Course Assessment Survey (LCAS), a measure of three laboratory course design features: collaboration, discovery and relevance, and iteration. Results from analysis of LCAS data indicate that it is useful for distinguishing between research courses and traditional lab courses.
Course-based undergraduate research experiences (CUREs) provide a promising avenue to attract a larger and more diverse group of students into research careers. CUREs are thought to be distinctive in offering students opportunities to make discoveries, collaborate, engage in iterative work, and develop a sense of ownership of their lab course work. Yet how these elements affect students’ intentions to pursue research-related careers remain unexplored. To address this knowledge gap, we collected data on three design features thought to be distinctive of CUREs (discovery, iteration, collaboration) and on students’ levels of ownership and career intentions from ∼800 undergraduates who had completed CURE or inquiry courses, including courses from the Freshman Research Initiative (FRI), which has a demonstrated positive effect on student retention in college and in science, technology, engineering, and mathematics. We used structural equation modeling to test relationships among the design features and student ownership and career intentions. We found that discovery, iteration, and collaboration had small but significant effects on students’ intentions; these effects were fully mediated by student ownership. Students in FRI courses reported significantly higher levels of discovery, iteration, and ownership than students in other CUREs. FRI research courses alone had a significant effect on students’ career intentions.
Undergraduate research with mentorship from faculty may be particularly important for ensuring the persistence of women and minority students in science. This study examines whether undergraduate researchers’ outcomes differ in relation to their gender or race/ethnicity and whether the mentoring structures they experience explain the differences.
This study describes the development of a survey grounded in expectancy-value theory, providing multiple forms of validity evidence to support its use as a measure of students’ interest in using math to understand biology, the usefulness of math for one’s life science career, and the perceived cost of using math in biology courses.
Undergraduate life science majors are reputed to have negative attitudes toward mathematics, yet little empirical evidence supports this belief. The adaptation and initial findings of a semantic differential measure of science and math majors’ emotional satisfaction with math are reported here.
Direct observation recording procedures produce reductive summary measurements of an underlying stream of behavior. Previous methodological studies of these recording procedures have employed simulation methods for generating random behavior streams, many of which amount to special cases of a statistical model known as the alternating renewal process. This paper describes the alternating renewal process model in its general form, demonstrates how it provides an organizing framework for most past simulation research on direct observation procedures, and introduces a freely available software package that implements the model. The software can be used to simulate behavior streams as well as data from many common recording procedures, including continuous recording, momentary time sampling, event counting, and interval recording procedures. Several examples illustrate how the software can be used to study the validity and reliability of direct observation data and to develop measurement strategies during the planning phases of empirical studies.
An important consideration of any computer adaptive testing (CAT) program is the criterion used for ending item administration-the stopping rule, which ensures that all examinees are assessed to the same standard. Although various stopping rules exist, none of them have been compared under the generalized partial-credit model (Muraki in Applied Psychological Measurement, 16, 159-176, 1992). In this simulation study we compared the performance of three variable-length stopping rules-standard error (SE), minimum information (MI), and change in theta (CT)-both in isolation and in combination with requirements of minimum and maximum numbers of items, as well as a fixed-length stopping rule. Each stopping rule was examined under two termination criteria-one a more lenient requirement (SE = 0.35, MI = 0.56, CT = 0.05), and one more stringent (SE = 0.30, MI = 0.42, CT = 0.02). The simulation design also included content-balancing and exposure controls, aspects of CAT that have been excluded in previous research comparing variable-length stopping rules. The minimum-information stopping rule produced biased theta estimates and varied greatly in measurement quality across the theta distribution. The absolute-change-in-theta stopping rule had strong performance when paired with a lower criterion and a minimum test length. The standard error stopping rule consistently provided the best balance of measurement precision and operational efficiency and was based on the fewest number of administered items necessary to obtain accurate and precise theta estimates, particularly when it was paired with a maximum-number-of-items stopping rule.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.