Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

Pargent, Florian; Schoedel, Ramona; Stachl, Clemens

doi:10.31234/osf.io/89snd

Cited by 10 publications

(11 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Individual decision trees recursively split the feature space (rules to distinguish classes) with the goal to separate the different classes of the criterion (drop out vs. remain in our case). For a detailed description of how individual decision trees operate and translate to a random forest see Pargent, Schoedel & Stachl 80 .…”

Section: Methodsmentioning

confidence: 99%

Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics

Matz

Bukow

Peters

et al. 2023

Sci Rep

Self Cite

View full text Add to dashboard Cite

Student attrition poses a major challenge to academic institutions, funding bodies and students. With the rise of Big Data and predictive analytics, a growing body of work in higher education research has demonstrated the feasibility of predicting student dropout from readily available macro-level (e.g., socio-demographics or early performance metrics) and micro-level data (e.g., logins to learning management systems). Yet, the existing work has largely overlooked a critical meso-level element of student success known to drive retention: students’ experience at university and their social embeddedness within their cohort. In partnership with a mobile application that facilitates communication between students and universities, we collected both (1) institutional macro-level data and (2) behavioral micro and meso-level engagement data (e.g., the quantity and quality of interactions with university services and events as well as with other students) to predict dropout after the first semester. Analyzing the records of 50,095 students from four US universities and community colleges, we demonstrate that the combined macro and meso-level data can predict dropout with high levels of predictive performance (average AUC across linear and non-linear models = 78%; max AUC = 88%). Behavioral engagement variables representing students’ experience at university (e.g., network centrality, app engagement, event ratings) were found to add incremental predictive power beyond institutional variables (e.g., GPA or ethnicity). Finally, we highlight the generalizability of our results by showing that models trained on one university can predict retention at another university with reasonably high levels of predictive performance.

show abstract

Section: Methodsmentioning

confidence: 99%

Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics

Matz

Bukow

Peters

et al. 2023

Sci Rep

Self Cite

View full text Add to dashboard Cite

show abstract

“…The development of such models requires diagnostic, meaning sensitive and privacy protected, information about individuals. Hence there are many challenges and professional requirements that need to be met for safe and ethical handling and development of such models (Pargent, Schoedel, & Stachl, 2022).…”

Section: Intentionally Modelling Vulnerabilitymentioning

confidence: 99%

Against Algorithmic Exploitation of Human Vulnerabilities

Strümke¹,

Slavkovik²,

Stachl³

2023

Preprint

View full text Add to dashboard Cite

Decisions such as which movie to watch next, which song to listen to, or which product to buy online, are increasingly influenced by recommender systems and user models that incorporate information on users' past behaviours, preferences, and digitally created content. Machine learning models that enable recommendations and that are trained on user data may unintentionally leverage information on human characteristics that are considered vulnerabilities, such as depression, young age, or gambling addiction. The use of algorithmic decisions based on latent vulnerable state representations could be considered manipulative and could have a deteriorating impact on the condition of vulnerable individuals. In this paper, we are concerned with the problem of machine learning models inadvertently modelling vulnerabilities, and want to raise awareness for this issue to be considered in legislation and AI ethics. Hence, we define and describe common vulnerabilities, and illustrate cases where they are likely to play a role in algorithmic decision-making. We propose a set of requirements for methods to detect the potential for vulnerability modelling, detect whether vulnerable groups are treated differently by a model, and detect whether a model has created an internal representation of vulnerability. We conclude that explainable artificial intelligence methods may be necessary for detecting vulnerability exploitation by machine learning-based recommendation systems.

show abstract

“…The basic approach to detect overfitting in ML is to split the sample data into two parts: a training set whose observations are used to train the algorithm and a test set whose observations are predicted to estimate the performance of the trained algorithm on new, unseen data (Yarkoni & Westfall, 2017). Since going into detail about the terminology and foundations of ML would be beyond the scope of this paper, we refer newcomers to Pargent et al (2022) or Yarkoni and Westfall (2017) for introductions more tailored to psychologists; for a special focus on personality research and assessment, see Stachl et al (2020); for a special focus on clinical psychology, see Dwyer et al (2018).…”

Section: In Psychological Sciencementioning

confidence: 99%

Everything has its price: Foundations of cost-sensitive machine learning and its application in psychology.

Sterner

Goretzko

Pargent

2023

Psychological Methods

View full text Add to dashboard Cite

Psychology has seen an increase in the use of machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false positive or false negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive machine learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, that is, the drug consumption data set (N = 1, 885) from the University of California Irvine ML Repository. In our example, all demonstrated CSL methods noticeably reduced mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/).

show abstract

Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

Cited by 10 publications

References 58 publications

Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics

Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics

Against Algorithmic Exploitation of Human Vulnerabilities

Everything has its price: Foundations of cost-sensitive machine learning and its application in psychology.

Contact Info

Product

Resources

About