Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

Pargent, Florian; Schoedel, Ramona; Stachl, Clemens

doi:10.1177/25152459231162559

Cited by 20 publications

(12 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Doubly robust estimators provide a partial solution to this problem by combining both nuisance models; they give us two shots to get things right: As long as one of the nuisance models is correctly specified, the resulting estimate does not suffer from misspecification bias. 9 However, this doubly robust property comes at a cost: Although (systematic) bias may be reduced, variance increases in comparison with g-computation (Tan, 2007); thus, we face a bias-variance trade-off (Pargent et al, 2023).…”

Section: Doubly Robust Standardizationmentioning

confidence: 99%

The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization

Chatton,

Rohrer

2024

Advances in Methods and Practices in Psychological Science

View full text Add to dashboard Cite

Recent developments in the causal-inference literature have renewed psychologists’ interest in how to improve causal conclusions based on observational data. A lot of the recent writing has focused on concerns of causal identification (under which conditions is it, in principle, possible to recover causal effects?); in this primer, we turn to causal estimation (how do researchers actually turn the data into an effect estimate?) and modern approaches to it that are commonly used in epidemiology. First, we explain how causal estimands can be defined rigorously with the help of the potential-outcomes framework, and we highlight four crucial assumptions necessary for causal inference to succeed (exchangeability, positivity, consistency, and noninterference). Next, we present three types of approaches to causal estimation and compare their strengths and weaknesses: propensity-score methods (in which the independent variable is modeled as a function of controls), g-computation methods (in which the dependent variable is modeled as a function of both controls and the independent variable), and doubly robust estimators (which combine models for both independent and dependent variables). A companion R Notebook is available at github.com/ArthurChatton/CausalCookbook. We hope that this nontechnical introduction not only helps psychologists and other social scientists expand their causal toolbox but also facilitates communication across disciplinary boundaries when it comes to causal inference, a research goal common to all fields of research.

show abstract

Section: Doubly Robust Standardizationmentioning

confidence: 99%

The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization

Chatton,

Rohrer

2024

Advances in Methods and Practices in Psychological Science

View full text Add to dashboard Cite

show abstract

“…Critically, ML has the potential to elucidate if there are variable interactions that differentiate between categories. Most importantly, ML provides greater flexibility compared to other statistical methodologies previously used in studies evaluating differences between anorexia diagnoses (Pargent et al, 2023).…”

Section: Improving Ed Diagnostic Classification Via Machine Learningmentioning

confidence: 99%

Differentiation between atypical anorexia nervosa and anorexia nervosa using machine learning

Sandoval‐Araujo,

Cusack,

Ralph‐Nearman

et al. 2024

Intl J Eating Disorders

View full text Add to dashboard Cite

ObjectiveBody mass index (BMI) is the primary criterion differentiating anorexia nervosa (AN) and atypical anorexia nervosa despite prior literature indicating few differences between disorders. Machine learning (ML) classification provides us an efficient means of accurately distinguishing between two meaningful classes given any number of features. The aim of the present study was to determine if ML algorithms can accurately distinguish AN and atypical AN given an ensemble of features excluding BMI, and if not, if the inclusion of BMI enables ML to accurately classify between the two.MethodsUsing an aggregate sample from seven studies consisting of individuals with AN and atypical AN who completed baseline questionnaires (N = 448), we used logistic regression, decision tree, and random forest ML classification models each trained on two datasets, one containing demographic, eating disorder, and comorbid features without BMI, and one retaining all features and BMI.ResultsModel performance for all algorithms trained with BMI as a feature was deemed acceptable (mean accuracy = 74.98%, mean area under the receiving operating characteristics curve [AUC] = 74.75%), whereas model performance diminished without BMI (mean accuracy = 59.37%, mean AUC = 59.98%).DiscussionModel performance was acceptable, but not strong, if BMI was included as a feature; no other features meaningfully improved classification. When BMI was excluded, ML algorithms performed poorly at classifying cases of AN and atypical AN when considering other demographic and clinical characteristics. Results suggest a reconceptualization of atypical AN should be considered.Public SignificanceThere is a growing debate about the differences between anorexia nervosa and atypical anorexia nervosa as their diagnostic differentiation relies on BMI despite being similar otherwise. We aimed to see if machine learning could distinguish between the two disorders and found accurate classification only if BMI was used as a feature. This finding calls into question the need to differentiate between the two disorders.

show abstract

“…The basic approach to detect overfitting in ML is to split the sample data into two parts: a training set whose observations are used to train the algorithm and a test set whose observations are predicted to estimate the performance of the trained algorithm on new, unseen data (Yarkoni & Westfall, 2017). Since going into detail about the terminology and foundations of ML would be beyond the scope of this paper, we refer newcomers to Pargent et al (2022) or Yarkoni and Westfall (2017) for introductions more tailored to psychologists; for a special focus on personality research and assessment, see Stachl et al (2020); for a special focus on clinical psychology, see Dwyer et al (2018).…”

Section: In Psychological Sciencementioning

confidence: 99%

“…The outer resampling loop is used to evaluate predictive performance (similar to the case without any hyperparameters) while an inner resampling loop is created in which the tuning procedure as described above is repeated within each of the outer training sets. A more detailed description of nested resampling can be found in Lang et al (2020) or Pargent et al (2022). If accuracy is used in the inner tuning loop, costs are not taken into account when selecting optimal hyperparameter values, even if the Undersampling/oversampling (Elkan, 2001) SMOTE (Chawla et al, 2002;Fernàndez et al, 2018) MetaCost (Domingos, 1999) 3.…”

Section: Cost-sensitive Hyperparameter Tuningmentioning

confidence: 99%

Everything has its price: Foundations of cost-sensitive machine learning and its application in psychology.

Sterner

Goretzko

Pargent

2023

Psychological Methods

View full text Add to dashboard Cite

Psychology has seen an increase in the use of machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false positive or false negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive machine learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, that is, the drug consumption data set (N = 1, 885) from the University of California Irvine ML Repository. In our example, all demonstrated CSL methods noticeably reduced mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/).

show abstract

Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

Cited by 20 publications

References 79 publications

The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization

The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization

Differentiation between atypical anorexia nervosa and anorexia nervosa using machine learning

Everything has its price: Foundations of cost-sensitive machine learning and its application in psychology.

Contact Info

Product

Resources

About