Multiple imputation (MI) is one of the principled methods for dealing with missing data. In addition, multilevel models have become a standard tool for analyzing the nested data structures that result when lower level units (e.g., employees) are nested within higher level collectives (e.g., work groups). When applying MI to multilevel data, it is important that the imputation model takes the multilevel structure into account. In the present paper, based on theoretical arguments and computer simulations, we provide guidance using MI in the context of several classes of multilevel models, including models with random intercepts, random slopes, cross-level interactions (CLIs), and missing data in categorical and group-level variables. Our findings suggest that, oftentimes, several approaches to MI provide an effective treatment of missing data in multilevel research. Yet we also note that the current implementations of MI still have room for improvement when handling missing data in explanatory variables in models with random slopes and CLIs. We identify areas for future research and provide recommendations for research practice along with a number of step-bystep examples for the statistical software R.
Multiple imputation is a tool for parameter estimation and inference with partially observed data, which is used increasingly widely in medical and social research. When the data to be imputed are correlated or have a multilevel structure-repeated observations on patients, school children nested in classes within schools within educational districts-the imputation model needs to include this structure. Here we introduce our joint modelling package for multiple imputation of multilevel data, jomo, which uses a multivariate normal model fitted by Markov Chain Monte Carlo (MCMC). Compared to previous packages for multilevel imputation, e.g. pan, jomo adds the facility to (i) handle and impute categorical variables using a latent normal structure, (ii) impute level-2 variables, and (iii) allow for cluster-specific covariance matrices, including the option to give them an inverse-Wishart distribution at level 2. The package uses C routines to speed up the computations and has been extensively validated in simulation studies both by ourselves and others.
The treatment of missing data can be difficult in multilevel research because state-of-the-art procedures such as multiple imputation (MI) may require advanced statistical knowledge or a high degree of familiarity with certain statistical software. In the missing data literature, pan has been recommended for MI of multilevel data. In this article, we provide an introduction to MI of multilevel missing data using the R package pan, and we discuss its possibilities and limitations in accommodating typical questions in multilevel research. To make pan more accessible to applied researchers, we make use of the mitml package, which provides a user-friendly interface to the pan package and several tools for managing and analyzing multiply imputed data sets. We illustrate the use of pan and mitml with two empirical examples that represent common applications of multilevel models, and we discuss how these procedures may be used in conjunction with other software.Keywords multiple imputation, missing data, multilevel, R 2 SAGE Open behind pan and MI, and we discuss which features of multilevel models must be considered when conducting MI. Finally, we use the mitml package to carry out MI for the empirical example. In that context, we will discuss possibilities for model diagnostics and tests of statistical hypotheses (e.g., model constraints, model comparisons). Multilevel Modeling: An Empirical ExampleMultilevel models account for dependencies in the data and allow relationships between variables to be estimated at different levels of analysis or effects that may vary across higher level observational units. For the purpose of this article, we assume that the multilevel structure consists of persons (e.g., students, employees) nested within groups (e.g., classes, work groups). If only the regression intercept varies across groups, the model is referred to as a random-intercept model. For example, Chen and Bliese (2002) examined the effects of individual characteristics (e.g., psychological strain) and leadership climate on the self-efficacy of U.S. soldiers. Kunter, Baumert, and Köller (2007) investigated the effects of student-and group-level ratings of classroom management on students' interest in mathematics. If the effects of additional predictor variables vary across groups, the model is referred to as a random-slope or random-coefficients model. For example, Hofmann, Morgeson, and Gerras (2003) investigated varying effects of leader-member exchange on safety behavior across work teams in the U.S. army.The example data set used in this article is from the field of educational research and was taken from the German sample of primary school students who participated in the Progress in International Reading Literacy Study (PIRLS;Bos et al., 2005;Mullis, Martin, Gonzales, & Kennedy, 2003). The data set includes test scores in both mathematics (MA) and reading achievement (RA), a measure of cognitive ability (CA), a measure of socioeconomic status (SES), students' ratings of the quality of teaching in their math...
Multiple imputation is a widely recommended means of addressing the problem of missing data in psychological research. An often-neglected requirement of this approach is that the imputation model used to generate the imputed values must be at least as general as the analysis model. For multilevel designs in which lower level units (e.g., students) are nested within higher level units (e.g., classrooms), this means that the multilevel structure must be taken into account in the imputation model. In the present article, we compare different strategies for multiply imputing incomplete multilevel data using mathematical derivations and computer simulations. We show that ignoring the multilevel structure in the imputation may lead to substantial negative bias in estimates of intraclass correlations as well as biased estimates of regression coefficients in multilevel models. We also demonstrate that an ad hoc strategy that includes dummy indicators in the imputation model to represent the multilevel structure may be problematic under certain conditions (e.g., small groups, low intraclass correlations). Imputation based on a multivariate linear mixed effects model was the only strategy to produce valid inferences under most of the conditions investigated in the simulation study. Data from an educational psychology research project are also used to illustrate the impact of the various multiple imputation strategies. (PsycINFO Database Record
Introduction: Evidence on effects of Internet-based interventions to treat subthreshold depression (sD) and prevent the onset of major depression (MDD) is inconsistent. Objective: We conducted an individual participant data meta-analysis to determine differences between intervention and control groups (IG, CG) in depressive symptom severity (DSS), treatment response, close to symptom-free status, symptom deterioration and MDD onset as well as moderators of intervention outcomes. Methods: Randomized controlled trials were identified through systematic searches via PubMed, PsycINFO, Embase and Cochrane Library. Multilevel regression analyses were used to examine efficacy and moderators. Results: Seven trials (2,186 participants) were included. The IG was superior in DSS at all measurement points (posttreatment: 6–12 weeks; Hedges’ g = 0.39 [95% CI: 0.25–0.53]; follow-up 1: 3–6 months; g = 0.30 [95% CI: 0.15–0.45]; follow-up 2: 12 months, g = 0.27 [95% CI: 0.07–0.47], compared with the CG. Significantly more participants in the IG than in the CG reached response and close to symptom-free status at all measurement points. A significant difference in symptom deterioration between the groups was found at the posttreatment assessment and follow-up 2. Incidence rates for MDD onset within 12 months were lower in the IG (19%) than in the CG (26%). Higher initial DSS and older age were identified as moderators of intervention effect on DSS. Conclusions: Our findings provide evidence for Internet-based interventions to be a suitable low-threshold intervention to treat individuals with sD and to reduce the incidence of MDD. This might be particularly true for older people with a substantial symptom burden.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.