Tools for automatic grading programming assignments, also known as Online Judges, have been widely used to support computer science (CS) courses. Nevertheless, few studies have used these tools to acquire and analyse interaction data to better understand the students’ performance and behaviours, often due to data availability or inadequate granularity. To address this problem, we propose an Online Judge called CodeBench, which allows for fine‐grained data collection of student interactions, at the level of, eg, keystrokes, number of submissions, and grades. We deployed CodeBench for 3 years (2016–18) and collected data from 2058 students from 16 introductory computer science (CS1) courses, on which we have carried out fine‐grained learning analytics, towards early detection of effective/ineffective behaviours regarding learning CS concepts. Results extract clear behavioural classes of CS1 students, significantly differentiated both semantically and statistically, enabling us to better explain how student behaviours during programming have influenced learning outcomes. Finally, we also identify behaviours that can guide novice students to improve their learning performance, which can be used for interventions. We believe this work is a step forward towards enhancing Online Judges and helping teachers and students improve their CS1 teaching/learning practices.
In this work, we present an approach to predict student performance in the very first two weeks from CS1 classes, which use programming online judges. We performed the prediction with a binary classification, i.e., we estimated whether the student succeeded or failed. To do so, we employed a method using an evolutionary algorithm to build and optimize automatically the machine learning pipeline. We trained the predictive model with data from 9 different courses run during 6 terms (2016)(2017)(2018). As a result, we achieved an AUC of 0.87 on the validation set.Resumo. Neste trabalhoé apresentada uma abordagem para a predição ainda nas duas primeiras semanas de aula do desempenho de alunos de turmas de Introduçãoà Programação de Computadores que utilizam sistemas de correção automática de código. O desempenho do aluno foi inferido através de uma classificação binária, istoé, foi estimado se o aluno iria ser aprovado ou reprovado na disciplina. Para tanto, foi utilizado um método que emprega uma algoritmo genético para a construção automática de um pipeline de aprendizagem de máquina. O modelo foi treinado com dados de alunos de 9 cursos diferentes ao longo de 6 semestres distintos (2016)(2017)(2018). Como resultado, obteve-se umá area sob a curva ROC de 0,87 na base de validação.
Many educational institutions have been using online judges in programming classes, amongst others, to provide faster feedback for students and to reduce the teacher's workload. There is some evidence that online judges also help in reducing dropout. Nevertheless, there is still a high level of dropout noticeable in introductory programming classes. In this sense, the objective of this work is to develop and validate a method for predicting student dropout using data from the first two weeks of study, to allow for early intervention. Instead of the classical questionnaire-based method, we opted for a non-subjective, data-driven approach. However, such approaches are known to suffer from a potential overload of factors, which may not all be relevant to the prediction task. As a result, we reached a very promising 80% of accuracy, and performed explicit extraction of the main factors leading to student dropout.
Introductory programming may be complex for many students. Moreover, there is a high failure and dropout rate in these courses. A potential way to tackle this problem is to predict student performance at an early stage, as it facilitates human-AI collaboration towards prescriptive analytics, where the instructors/monitors will be told how to intervene and support students - where early intervention is crucial. However, the literature states that there is no reliable predictor yet for programming students’ performance, since even large-scale analysis of multiple features have resulted in only limited predictive power. Notice that Deep Learning (DL) can provide high-quality results for huge amount of data and complex problems. In this sense, we employed DL for early prediction of students’ performance using data collected in the very first two weeks from introductory programming courses offered for a total of 2058 students during 6 semesters (longitudinal study). We compared our results with the state-of-the-art, an Evolutionary Algorithm (EA) that automatic creates and optimises machine learning pipelines. Our DL model achieved an average accuracy of 82.5%, which is statistically superior to the model constructed and optimised by the EA (p-value << 0.05 even with Bonferroni correction). In addition, we also adapted the DL model in a stacking ensemble for continuous prediction purposes. As a result, our regression model explained ~62% of the final grade variance. In closing, we also provide results on the interpretation of our regression model to understand the leading factors of success and failure in introductory programming.
While Massive Open Online Course (MOOCs) platforms provide knowledge in a new and unique way, the very high number of dropouts is a significant drawback. Several features are considered to contribute towards learner attrition or lack of interest, which may lead to disengagement or total dropout. The jury is still out on which factors are the most appropriate predictors. However, the literature agrees that early prediction is vital to allow for a timely intervention. Whilst feature-rich predictors may have the best chance for high accuracy, they may be unwieldy. This study aims to predict learner dropout early-on, from the first week, by comparing several machinelearning approaches, including Random Forest, Adaptive Boost, XGBoost and GradientBoost Classifiers. The results show promising accuracies (82%-94%) using as little as 2 features. We show that the accuracies obtained outperform state of the art approaches, even when the latter deploy several features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.