2023
DOI: 10.1177/25152459231162559
|View full text |Cite
|
Sign up to set email alerts
|

Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

Abstract: Supervised machine learning (ML) is becoming an influential analytical method in psychology and other social sciences. However, theoretical ML concepts and predictive-modeling techniques are not yet widely taught in psychology programs. This tutorial is intended to provide an intuitive but thorough primer and introduction to supervised ML for psychologists in four consecutive modules. After introducing the basic terminology and mindset of supervised ML, in Module 1, we cover how to use resampling methods to ev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 79 publications
0
9
0
Order By: Relevance
“…Doubly robust estimators provide a partial solution to this problem by combining both nuisance models; they give us two shots to get things right: As long as one of the nuisance models is correctly specified, the resulting estimate does not suffer from misspecification bias. 9 However, this doubly robust property comes at a cost: Although (systematic) bias may be reduced, variance increases in comparison with g-computation (Tan, 2007); thus, we face a bias-variance trade-off (Pargent et al, 2023).…”
Section: Doubly Robust Standardizationmentioning
confidence: 99%
“…Doubly robust estimators provide a partial solution to this problem by combining both nuisance models; they give us two shots to get things right: As long as one of the nuisance models is correctly specified, the resulting estimate does not suffer from misspecification bias. 9 However, this doubly robust property comes at a cost: Although (systematic) bias may be reduced, variance increases in comparison with g-computation (Tan, 2007); thus, we face a bias-variance trade-off (Pargent et al, 2023).…”
Section: Doubly Robust Standardizationmentioning
confidence: 99%
“…Critically, ML has the potential to elucidate if there are variable interactions that differentiate between categories. Most importantly, ML provides greater flexibility compared to other statistical methodologies previously used in studies evaluating differences between anorexia diagnoses (Pargent et al, 2023).…”
Section: Improving Ed Diagnostic Classification Via Machine Learningmentioning
confidence: 99%
“…The basic approach to detect overfitting in ML is to split the sample data into two parts: a training set whose observations are used to train the algorithm and a test set whose observations are predicted to estimate the performance of the trained algorithm on new, unseen data (Yarkoni & Westfall, 2017). Since going into detail about the terminology and foundations of ML would be beyond the scope of this paper, we refer newcomers to Pargent et al (2022) or Yarkoni and Westfall (2017) for introductions more tailored to psychologists; for a special focus on personality research and assessment, see Stachl et al (2020); for a special focus on clinical psychology, see Dwyer et al (2018).…”
Section: In Psychological Sciencementioning
confidence: 99%
“…The outer resampling loop is used to evaluate predictive performance (similar to the case without any hyperparameters) while an inner resampling loop is created in which the tuning procedure as described above is repeated within each of the outer training sets. A more detailed description of nested resampling can be found in Lang et al (2020) or Pargent et al (2022). If accuracy is used in the inner tuning loop, costs are not taken into account when selecting optimal hyperparameter values, even if the Undersampling/oversampling (Elkan, 2001) SMOTE (Chawla et al, 2002;Fernàndez et al, 2018) MetaCost (Domingos, 1999) 3.…”
Section: Cost-sensitive Hyperparameter Tuningmentioning
confidence: 99%