Using traces of behaviors to predict outcomes is useful in varied contexts ranging from buyer behaviors to behaviors collected from smart-home devices. Increasingly, higher education systems have been using Learning Management System (LMS) digital data to capture and understand students’ learning and well-being. Researchers in the social sciences are increasingly interested in the potential of using digital log data to predict outcomes and design interventions. Using LMS data for predicting the likelihood of students’ success in for-credit college courses provides a useful example of how social scientists can use these techniques on a variety of data types. Here, we provide a primer on how LMS data can be feature-mapped and analyzed to accomplish these goals. We begin with a literature review summarizing current approaches to analyzing LMS data, then discuss ethical issues of privacy when using demographic data and equitable model building. In the second part of the paper, we provide an overview of popular machine learning algorithms and review analytic considerations such as feature generation, assessment of model performance, and sampling techniques. Finally, we conclude with an empirical example demonstrating the ability of LMS data to predict student success, summarizing important features and assessing model performance across different model specifications.
Undergraduate science, technology, engineering, and mathematics (STEM) students’ motivations have a strong influence on whether and how they will persist through challenging coursework and into STEM careers. Proper conceptualization and measurement of motivation constructs, such as students’ expectancies and perceptions of value and cost (i.e., expectancy value theory [EVT]) and their goals (i.e., achievement goal theory [AGT]), are necessary to understand and enhance STEM persistence and success. Research findings suggest the importance of exploring multiple measurement models for motivation constructs, including traditional confirmatory factor analysis, exploratory structural equation models (ESEM), and bifactor models, but more research is needed to determine whether the same model fits best across time and context. As such, we measured undergraduate biology students’ EVT and AGT motivations and investigated which measurement model best fit the data, and whether measurement invariance held, across three semesters. Having determined the best-fitting measurement model and type of invariance, we used scores from the best performing model to predict biology achievement. Measurement results indicated a bifactor-ESEM model had the best data-model fit for EVT and an ESEM model had the best data-model fit for AGT, with evidence of measurement invariance across semesters. Motivation factors, in particular attainment value and subjective task value, predicted small yet statistically significant amounts of variance in biology course outcomes each semester. Our findings provide support for using modern measurement models to capture students’ STEM motivations and potentially refine conceptualizations of them. Such future research will enhance educators’ ability to benevolently monitor and support students’ motivation, and enhance STEM performance and career success.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.