“…Results show that the performance of predictive models strongly varies across courses, even when they are generated with data collected from a single institution. In Sukhbaatar et al [38], the authors used a decision tree analysis on LMS data with the goal of predict (until the middle of the semester) students that are at-risk of failing or dropout in a blended course. Results showed that this approach worked well to predict the dropouts.…”
Algorithms and programming are some of the most challenging topics faced by students during undergraduate programs. Dropout and failure rates in courses involving such topics are usually high, which has raised attention towards the development of strategies to attenuate this situation. Machine learning techniques can help in this direction by providing models able to detect at-risk students earlier. Therefore, lecturers, tutors or staff can pedagogically try to mitigate this problem. To early predict at-risk students in introductory programming courses, we present a comparative study aiming to find the best combination of datasets (set of variables) and classification algorithms. The data collected from Moodle was used to generate 13 distinct datasets based on different aspects of student interactions (cognitive presence, social presence and teaching presence) inside the virtual environment. Results show there are no statistically significant difference among models generated from the different datasets and that the counts of interactions together with derived attributes are sufficient for the task. The performances of the models varied for each semester, with the best of them able to detect students at-risk in the first week of the course with AUC ROC from 0.7 to 0.9. Moreover, the use of SMOTE to balance the datasets did not improve the performance of the models.
“…Results show that the performance of predictive models strongly varies across courses, even when they are generated with data collected from a single institution. In Sukhbaatar et al [38], the authors used a decision tree analysis on LMS data with the goal of predict (until the middle of the semester) students that are at-risk of failing or dropout in a blended course. Results showed that this approach worked well to predict the dropouts.…”
Algorithms and programming are some of the most challenging topics faced by students during undergraduate programs. Dropout and failure rates in courses involving such topics are usually high, which has raised attention towards the development of strategies to attenuate this situation. Machine learning techniques can help in this direction by providing models able to detect at-risk students earlier. Therefore, lecturers, tutors or staff can pedagogically try to mitigate this problem. To early predict at-risk students in introductory programming courses, we present a comparative study aiming to find the best combination of datasets (set of variables) and classification algorithms. The data collected from Moodle was used to generate 13 distinct datasets based on different aspects of student interactions (cognitive presence, social presence and teaching presence) inside the virtual environment. Results show there are no statistically significant difference among models generated from the different datasets and that the counts of interactions together with derived attributes are sufficient for the task. The performances of the models varied for each semester, with the best of them able to detect students at-risk in the first week of the course with AUC ROC from 0.7 to 0.9. Moreover, the use of SMOTE to balance the datasets did not improve the performance of the models.
“…Além disso, DT possuem fácil interpretação quanto às suas regras de predição (Louppe, 2014). Estas particularidades fazem desse modelo um algoritmo de aprendizado popular e muito difundido para a predição da evasão escolar (Pereira & Zambrano, 2017;Sukhbaatar et al, 2018). A RF é um modelo baseado em árvores de decisão, que lida bem com conjunto de dados de alta dimensão (Hastie et al, 2009).…”
A evasão escolar é um desafio diário para instituições de ensino, no caso específico do ensino superior as altas taxas acarretam perdas financeiras e escassez de profissionais no mercado. Esta pesquisa teve como objetivo desenvolver e avaliar um modelo preditivo para identificar alunos propensos à evasão, utilizando dados de um modelo semestral de autoavaliação dos cursos de graduação da Universidade Federal da Paraíba (UFPB). Utilizando a mineração de dados educacionais e a metodologia CRISP-EDM, o estudo analisou a relação entre evasão escolar e autoavaliação institucional, seguido de análise exploratória e preparação dos dados para classificação. Diversas técnicas de modelagem, como Árvore de Decisão, Floresta Aleatória e Máquinas de Vetores de Suporte, foram aplicadas, sendo os modelos avaliados por métricas de desempenho, revelando uma acurácia de 87,97%, precisão de 91,72%, recall de 91,67% e medida F de 91,57% na identificação de alunos com alta probabilidade de evasão. Cerca de 59% dos alunos ativos da UFPB admitidos a partir de 2017 demonstraram probabilidade de abandonar seus cursos nos testes do modelo preditivo proposto. Essas informações podem embasar decisões institucionais e a implementação de políticas e ações eficazes contra a evasão, visando melhorar os resultados acadêmicos. O estudo contribui para avanços na predição de evasão escolar, fornecendo insights valiosos para decisões e estratégias preventivas na UFPB e outras instituições de ensino superior.
“…This underscores the pressing need for comprehensive methodologies that seamlessly integrate feature selection with predictive modeling. Table 2 provides a succinct summary of diverse feature selection methods employed in EDM research, including manual selection [25]- [30], filter-based techniques such as correlation and information gain [31]- [37], and wrapper methods such as genetic algorithms and Principal Component Analysis (PCA) [38]- [43]. These diverse approaches collectively contribute to the evolving landscape of feature selection in EDM, paving the way for more robust predictive models.…”
Educational Data Mining (EDM) is used to ameliorate the teaching and learning process by analyzing and classifying data that can be applied to predict the students' academic performance, and students' dropout rate, as well as instructors' performance. The prediction of student performance is complicated by the vast and diverse range of variables from academic records to behavioral and health metrics. In this paper, we have introduced a new Adaptive Feature Selection Algorithm (AFSA) by amalgamating an ensemble approach for initial feature ranking with normalized mean ranking from five distinct methods to enhance robustness. The proposed method iteratively selects the best features by adjusting its threshold based on each feature's rank to ensure significant contributions to model accuracy and also effectively reduces dataset complexity. We have tested the performance of the proposed feature selection algorithm using five machine learning classifiers: Logistic Regression (LR), K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naïve Bayes (NB) classifier, and Decision Tree (DT) classifier on four student performance datasets. The experimental results highlight the proposed method significantly decreases feature count by an average feature reduction factor of 5.7, significantly streamlining datasets while maintaining competitive cross-validation accuracy, marking it as a valuable tool in the field of educational data analytics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.