An increasing number of researchers pool, harmonize, and analyze survey data from different survey providers for their research questions. They aim to study heterogeneity between groups over a long period or examine smaller subgroups; research questions that can be impossible to answer with a single survey. This combination or pooling of data is known as individual person data (IPD) meta-analysis in medicine and psychology; in sociology, it is understood as part of ex-post survey harmonization (Granda et al 2010).However, in medicine or psychology, most original studies focus on treatment or intervention effect and apply experimental research designs to come to causal conclusions. In contrast, many sociological or economic studies are nonexperimental. In comparison to experimental data, survey-based data is subject to complex sampling and nonresponse. Ignoring the complex sampling design can lead to biased population inferences not only in population means and shares but also in regression coefficients, widely used in the social sciences (DuMouchel and Duncan 1983 and Solon et al. 2013). To account for complex sampling schemes or non-ignorable unit nonresponse, survey-based data often comes with survey weights. But how to use survey weights after pooling different surveys?We will build upon the work done by DuMouchel and Duncan (1983) and Solon et al. (2013) for survey-weighted regression analysis with a single data set. Through Monte Carlo (MC) simulations, we will show that endogenous sampling and heterogeneity of effects models require survey weighting to receive approximately unbiased estimates after ex-post survey harmonization. Second, we focus on a list of methodological questions: Do survey-weighted one-stage and two-stage (meta-)analytical approaches perform differently? Is it possible to include random effects, especially if we have to assume study heterogeneity? Another challenging methodological question is the inclusion of random effects in a one-stage analysis.Our simulations show that two-stage analysis will be biased if the weights' variation is high, whereas one-stage analysis remains unbiased. We also show that the inclusion of random effects in a one-stage analysis is challenging but doable, i.e., weights must be transformed in most cases. Apart from the MC simulations, we also show the difference between two-stage and one-stage approaches with real-world data from same-sex couples in Germany.
The International Program in Survey and Data Science (IPSDS) is an online educational program, which can be attended through the Joint Program in Survey Methodology (JPSM) at the University of Maryland (UMD) and a part-time Master of Applied Data Science & Measurement (MDM) at the University of Mannheim and Mannheim Business School (MBS). It is targeted towards and attended by working professionals involved or interested in data collection and data analysis including those working in official statistics. The program conveys competencies in the areas of data collection, data analysis, data storage, and data visualization. The faculty of the program includes researchers and lecturers from both the University of Maryland and the University of Mannheim as well as other organizations such as destatis and Statistics Netherlands in the field of official statistics. The program was awarded the label of ‘European Master in Official Statistics (EMOS)’ under conditions in May 2021. In the article, we summarize the methodological and statistical competencies needed in official statistics and show how IPSDS covers this set of skills. We will present the flipped classroom design used for the IPSDS program and demonstrate that it is especially suited for students who are working professionals at the same time.
Many phenomena in the social or the medical sciences can be described as events, meaning that a qualitative change occurs at some particular point in time. Typical research questions focus on whether, when, and under which circumstances events occur. In the social sciences, discrete-time-to-event models are popular (Discrete-Time Survival Analysis Model, DTSAM). Data analyzed through DTSAMs is in the so-called person-period format. The model is a logistic regression model with the event indicator as the dependent variable. However, like many other statistical applications, the practical analysis of discrete-time survival data is challenged by missing data in one or more covariates. Negative consequences of such missing data range from efficiency losses to bias. A popular approach to circumvent these unwanted effects of missing data is multiple imputation (MI). With multiple imputation, it is crucial to include outcome information in the model for imputing partially observed covariates. Unfortunately, this is not straightforward in case of DTSAM, since we (a) usually have a partly observed (left- or right-censored) outcome, (b) do not have only one outcome variable, but two: the event indicator and the time-to-event and (c) have to decide whether to impute while the data set is still in person format or after transformation in person-period format, especially if we look at time-invariant information. Since there is little guidance on how to incorporate the observed outcome information in the imputation model of missing covariates in discrete-time survival analysis, we explore different approaches using fully conditional specification (FCS) (van Buuren 2006) and the newer substantial model compatible (SMC-) FCS MI (Bartlett et al., 2014). These approaches vary in their complexity with which we incorporate the outcome into the imputation model, the FCS algorithm used, and the data format used during the imputation. We compare the methods using Monte Carlo simulations and provide a practical example using data from the German Family Panel pairfam.We confirm the results by White and Royston (2009) and Beesley et al. (2016) that imputing conditional on the (partly imputed) uncensored time-to-event yields high bias. A compatible imputation model for SMC-FCS MI with data in person-period format proves to be the key to imputations with good performance results under different simulation conditions.
In this study, we demonstrate how supervised learning can extract interpretable survey motivation measurements from a large number of responses to an open-ended question. We manually coded a subsample of 5,000 responses to an open-ended question on survey motivation from the GESIS Panel (25,000 responses in total); we utilized supervised machine learning to classify the remaining responses. We can demonstrate that the responses on survey motivation in the GESIS Panel are particularly well suited for automated classification, since they are mostly one-dimensional. The evaluation of the test set also indicates very good overall performance. We present the pre-processing steps and methods we used for our data, and by discussing other popular options that might be more suitable in other cases, we also generalize beyond our use case. We also discuss various minor problems, such as a necessary spelling correction. Finally, we can showcase the analytic potential of the resulting categorization of panelists' motivation through an event history analysis of panel dropout. The analytical results allow a close look at respondents' motivations: they span a wide range, from the urge to help to interest in questions or the incentive and the wish to influence those in power through their participation. We conclude our paper by discussing the re-usability of the hand-coded responses for other surveys, including similar open questions to the GESIS Panel question.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.