Kellyn F Arnold scite author profile

Background Directed acyclic graphs (DAGs) are an increasingly popular approach for identifying confounding variables that require conditioning when estimating causal effects. This review examined the use of DAGs in applied health research to inform recommendations for improving their transparency and utility in future research. Methods Original health research articles published during 1999–2017 mentioning ‘directed acyclic graphs’ (or similar) or citing DAGitty were identified from Scopus, Web of Science, Medline and Embase. Data were extracted on the reporting of: estimands, DAGs and adjustment sets, alongside the characteristics of each article’s largest DAG. Results A total of 234 articles were identified that reported using DAGs. A fifth (n = 48, 21%) reported their target estimand(s) and half (n = 115, 48%) reported the adjustment set(s) implied by their DAG(s). Two-thirds of the articles (n = 144, 62%) made at least one DAG available. DAGs varied in size but averaged 12 nodes [interquartile range (IQR): 9–16, range: 3–28] and 29 arcs (IQR: 19–42, range: 3–99). The median saturation (i.e. percentage of total possible arcs) was 46% (IQR: 31–67, range: 12–100). 37% (n = 53) of the DAGs included unobserved variables, 17% (n = 25) included ‘super-nodes’ (i.e. nodes containing more than one variable) and 34% (n = 49) were visually arranged so that the constituent arcs flowed in the same direction (e.g. top-to-bottom). Conclusion There is substantial variation in the use and reporting of DAGs in applied health research. Although this partly reflects their flexibility, it also highlights some potential areas for improvement. This review hence offers several recommendations to improve the reporting and use of DAGs in future research.

show abstract

Time to reality check the promises of machine learning-powered precision medicine

Wilkinson

Arnold

Murray

et al. 2020

The Lancet Digital Health

168

135

View full text Add to dashboard Cite

Machine learning methods, combined with large electronic health databases, could enable a personalised approach to medicine through improved diagnosis and prediction of individual responses to therapies. If successful, this strategy would represent a revolution in clinical research and practice. However, although the vision of individually tailored medicine is alluring, there is a need to distinguish genuine potential from hype. We argue that the goal of personalised medical care faces serious challenges, many of which cannot be addressed through algorithmic complexity, and call for collaboration between traditional methodologists and experts in medical machine learning to avoid extensive research waste.

show abstract

Analyses of ‘change scores’ do not estimate causal effects in observational data

Tennant

Arnold

Ellison

et al. 2021

View full text Add to dashboard Cite

Background In longitudinal data, it is common to create ‘change scores’ by subtracting measurements taken at baseline from those taken at follow-up, and then to analyse the resulting ‘change’ as the outcome variable. In observational data, this approach can produce misleading causal-effect estimates. The present article uses directed acyclic graphs (DAGs) and simple simulations to provide an accessible explanation for why change scores do not estimate causal effects in observational data. Methods Data were simulated to match three general scenarios in which the outcome variable at baseline was a (i) ‘competing exposure’ (i.e. a cause of the outcome that is neither caused by nor causes the exposure), (ii) confounder or (iii) mediator for the total causal effect of the exposure variable at baseline on the outcome variable at follow-up. Regression coefficients were compared between change-score analyses and the appropriate estimator(s) for the total and/or direct causal effect(s). Results Change-score analyses do not provide meaningful causal-effect estimates unless the baseline outcome variable is a ‘competing exposure’ for the effect of the exposure on the outcome at follow-up. Where the baseline outcome is a confounder or mediator, change-score analyses evaluate obscure estimands, which may diverge substantially in magnitude and direction from the total and direct causal effects. Conclusion Future observational studies that seek causal-effect estimates should avoid analysing change scores and adopt alternative analytical strategies.

show abstract

Adjustment for energy intake in nutritional research: a causal inference perspective

Tomova

Arnold

Gilthorpe

et al. 2022

The American Journal of Clinical Nutrition

View full text Add to dashboard Cite

Background Four models are commonly used to adjust for energy intake when estimating the causal effect of a dietary component on an outcome; 1) the “standard model” adjusts for total energy intake, 2) the “energy partition model” adjusts for remaining energy intake, 3) the “nutrient density model” rescales the exposure as a proportion of total energy, and 4) the “residual model” indirectly adjusts for total energy by using a residual. It remains underappreciated that each approach evaluates a different estimand and only partially accounts for confounding by common dietary causes. Objective To clarify the implied causal estimand and interpretation of each model and evaluate their performance in reducing dietary confounding. Design Semi-parametric directed acyclic graphs and Monte Carlo simulations were used to identify the estimands and interpretations implied by each model and explore their performance in the absence or presence of dietary confounding. Results The “standard model” and the mathematically identical “residual model” estimate the average relative causal effect (i.e., a “substitution” effect) but provide biased estimates even in the absence of confounding. The “energy partition model” estimates the total causal effect but only provides unbiased estimates in the absence of confounding or when all other nutrients have equal effects on the outcome. The “nutrient density model” has an obscure interpretation but attempts to estimate the average relative causal effect rescaled as a proportion of total energy intake. Accurate estimates of both the total and average relative causal effects may instead be estimated by simultaneously adjusting for all dietary components, an approach we term the “all-components model.” Conclusion Lack of awareness of the estimand differences and accuracy of the four modelling approaches may explain some of the apparent heterogeneity among existing nutritional studies and raise serious questions regarding the validity of meta-analyses where different estimands have been inappropriately pooled.

show abstract

Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning

Arnold

Davies

Kamps

et al. 2020

View full text Add to dashboard Cite

Prediction and causal explanation are fundamentally distinct tasks of data analysis. In health applications, this difference can be understood in terms of the difference between prognosis (prediction) and prevention/treatment (causal explanation). Nevertheless, these two concepts are often conflated in practice. We use the framework of generalized linear models (GLMs) to illustrate that predictive and causal queries require distinct processes for their application and subsequent interpretation of results. In particular, we identify five primary ways in which GLMs for prediction differ from GLMs for causal inference: (i) the covariates that should be considered for inclusion in (and possibly exclusion from) the model; (ii) how a suitable set of covariates to include in the model is determined; (iii) which covariates are ultimately selected and what functional form (i.e. parameterization) they take; (iv) how the model is evaluated; and (v) how the model is interpreted. We outline some of the potential consequences of failing to acknowledge and respect these differences, and additionally consider the implications for machine learning (ML) methods. We then conclude with three recommendations that we hope will help ensure that both prediction and causal modelling are used appropriately and to greatest effect in health research.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kellyn F Arnold

Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations

Time to reality check the promises of machine learning-powered precision medicine

Analyses of ‘change scores’ do not estimate causal effects in observational data

Adjustment for energy intake in nutritional research: a causal inference perspective

Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning

Contact Info

Product

Resources

About