Background
Early identification of students who are potential candidates for achieving a degree in a Science, Technology, Engineering, or Mathematics (STEM) major would enable educators to offer programs designed to better enhance student interests and capabilities in those areas.
Purpose (Hypothesis)
This study uses an integrated model leveraging the strengths of multiple statistical techniques to analyze the educational process from pre‐high school through college and predict which students will achieve a STEM education.
Design/Method
The probability of earning a STEM degree is modeled using variables available as of the eighth grade as well as standardized test scores from high school. These include demographic, attitudinal, experiential, and academic performance measures derived from the National Education Longitudinal Study of 1988 (NELS:88) dataset. The integrated model combines logistic regression, survival analysis, and receiver operating characteristics (ROC) curve analysis to predict whether an individual is likely to obtain a STEM degree.
Results
Predicted results of the integrated model were compared to actual outcomes and those of a separate logistic regression model. The modeling process identified a set of significant predictive variables and achieved very good predictive accuracy. The integrated model and logistic regression model performed with comparable precision.
Conclusions
The modeling process was adept at identifying STEM students and a large pool of other degree students that might have been capable of pursuing a STEM degree. The results suggest that it is quite feasible to identify good STEM candidates for a pro‐STEM intervention to engage their interest in STEM and support stronger quantitative skill development.