The early identification of college students at risk of dropout is of great interest and importance all over the world, since the early leaving of higher education is associated with considerable personal and social costs. In Hungary, especially in STEM undergraduate programs, the dropout rate is particularly high, much higher than the EU average. In this work, using advanced machine learning models such as deep neural networks and gradient boosted trees, we aim to predict the final academic performance of students at the Budapest University of Technology and Economics. The dropout prediction is based on the data that are available at the time of enrollment. In addition to the predictions, we also interpret our machine learning models with the help of state-of-the-art interpretable machine learning techniques such as permutation importance and SHAP values. The accuracy and AUC of the best-performing deep learning model are 72.4% and 0.771, respectively that slightly outperforms XGBoost, the cutting-edge benchmark model for tabular data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.