The treatment of missing data can be difficult in multilevel research because state-of-the-art procedures such as multiple imputation (MI) may require advanced statistical knowledge or a high degree of familiarity with certain statistical software. In the missing data literature, pan has been recommended for MI of multilevel data. In this article, we provide an introduction to MI of multilevel missing data using the R package pan, and we discuss its possibilities and limitations in accommodating typical questions in multilevel research. To make pan more accessible to applied researchers, we make use of the mitml package, which provides a user-friendly interface to the pan package and several tools for managing and analyzing multiply imputed data sets. We illustrate the use of pan and mitml with two empirical examples that represent common applications of multilevel models, and we discuss how these procedures may be used in conjunction with other software.Keywords multiple imputation, missing data, multilevel, R 2 SAGE Open behind pan and MI, and we discuss which features of multilevel models must be considered when conducting MI. Finally, we use the mitml package to carry out MI for the empirical example. In that context, we will discuss possibilities for model diagnostics and tests of statistical hypotheses (e.g., model constraints, model comparisons).
Multilevel Modeling: An Empirical ExampleMultilevel models account for dependencies in the data and allow relationships between variables to be estimated at different levels of analysis or effects that may vary across higher level observational units. For the purpose of this article, we assume that the multilevel structure consists of persons (e.g., students, employees) nested within groups (e.g., classes, work groups). If only the regression intercept varies across groups, the model is referred to as a random-intercept model. For example, Chen and Bliese (2002) examined the effects of individual characteristics (e.g., psychological strain) and leadership climate on the self-efficacy of U.S. soldiers. Kunter, Baumert, and Köller (2007) investigated the effects of student-and group-level ratings of classroom management on students' interest in mathematics. If the effects of additional predictor variables vary across groups, the model is referred to as a random-slope or random-coefficients model. For example, Hofmann, Morgeson, and Gerras (2003) investigated varying effects of leader-member exchange on safety behavior across work teams in the U.S. army.The example data set used in this article is from the field of educational research and was taken from the German sample of primary school students who participated in the Progress in International Reading Literacy Study (PIRLS;Bos et al., 2005;Mullis, Martin, Gonzales, & Kennedy, 2003). The data set includes test scores in both mathematics (MA) and reading achievement (RA), a measure of cognitive ability (CA), a measure of socioeconomic status (SES), students' ratings of the quality of teaching in their math...