2021
DOI: 10.48550/arxiv.2103.14513
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predictive and explanatory models might miss informative features in educational data

Nicholas T. Young,
Marcos D. Caballero

Abstract: We encounter variables with little variation often in educational data mining (EDM) and discipline-based education research (DBER) due to the demographics of higher education and the questions we ask. Yet, little work has examined how to analyze such data. Therefore, we conducted a simulation study using logistic regression, penalized regression, and random forest. We systematically varied the fraction of positive outcomes, feature imbalances, and odds ratios. We find the algorithms treat features with the sam… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 83 publications
(115 reference statements)
0
1
0
Order By: Relevance
“…First, we used the AUC permutation feature importance [51] as it is claimed to be less biased than the accuracy-based permutation importance when input features differ in scale (as do our factors listed in Table II) and when the predicted variable is not split evenly between the two outcomes. In practice, our previous work suggests which method we pick will have minimal effect on the conclusions [52]. Under this approach, each feature is randomly permuted and then passed through the model to make a prediction.…”
Section: The Random Forest Algorithmmentioning
confidence: 99%
“…First, we used the AUC permutation feature importance [51] as it is claimed to be less biased than the accuracy-based permutation importance when input features differ in scale (as do our factors listed in Table II) and when the predicted variable is not split evenly between the two outcomes. In practice, our previous work suggests which method we pick will have minimal effect on the conclusions [52]. Under this approach, each feature is randomly permuted and then passed through the model to make a prediction.…”
Section: The Random Forest Algorithmmentioning
confidence: 99%