Principled methods with which to appropriately analyze missing data have long existed; however, broad implementation of these methods remains challenging. In this and 2 companion papers (Am J Epidemiol. 2018;187(3):576-584 and Am J Epidemiol. 2018;187(3):585-591), we discuss issues pertaining to missing data in the epidemiologic literature. We provide details regarding missing-data mechanisms and nomenclature and encourage the conduct of principled analyses through a detailed comparison of multiple imputation and inverse probability weighting. Data from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are used to create a masked data-analytical challenge with missing data induced by known mechanisms. We illustrate the deleterious effects of missing data with naive methods and show how principled methods can sometimes mitigate such effects. For example, when data were missing at random, naive methods showed a spurious protective effect of smoking on the risk of spontaneous abortion (odds ratio (OR) = 0.43, 95% confidence interval (CI): 0.19, 0.93), while implementation of principled methods multiple imputation (OR = 1.30, 95% CI: 0.95, 1.77) or augmented inverse probability weighting (OR = 1.40, 95% CI: 1.00, 1.97) provided estimates closer to the "true" full-data effect (OR = 1.31, 95% CI: 1.05, 1.64). We call for greater acknowledgement of and attention to missing data and for the broad use of principled missing-data methods in epidemiologic research.
Epidemiologic studies are frequently susceptible to missing information. Omitting observations with missing variables remains a common strategy in epidemiologic studies, yet this simple approach can often severely bias parameter estimates of interest if the values are not missing completely at random. Even when missingness is completely random, complete-case analysis can reduce the efficiency of estimated parameters, because large amounts of available data are simply tossed out with the incomplete observations. Alternative methods for mitigating the influence of missing information, such as multiple imputation, are becoming an increasing popular strategy in order to retain all available information, reduce potential bias, and improve efficiency in parameter estimation. In this paper, we describe the theoretical underpinnings of multiple imputation, and we illustrate application of this method as part of a collaborative challenge to assess the performance of various techniques for dealing with missing data (Am J Epidemiol. 2018;187(3):568-575). We detail the steps necessary to perform multiple imputation on a subset of data from the Collaborative Perinatal Project (1959-1974), where the goal is to estimate the odds of spontaneous abortion associated with smoking during pregnancy.
Purpose It is thought that total energy intake in women is increased during the luteal versus follicular phase of the menstrual cycle; however, less is understood regarding changes in diet composition (i.e., macro- and micronutrient intakes) across the cycle. The aim of this study was to investigate changes in macronutrient, micronutrient, and food group intakes across phases of the menstrual cycle among healthy women, and to assess whether these patterns differ by ovulatory status. Methods The BioCycle study (2005–2007) was a prospective cohort study of 259 healthy regularly menstruating women age 18–44 who were followed for up to two menstrual cycles. Dietary intake was measured using 24-h dietary recalls, and food cravings were assessed via questionnaire, up to four times per cycle, corresponding to menses, mid-follicular, expected ovulation, and luteal phases. Linear mixed models adjusting for total energy intake were used to evaluate changes across the cycle. Results Total protein (P = 0.03), animal protein (P = 0.05), and percent of caloric intake from protein (P = 0.02) were highest during the mid-luteal phase compared to the peri-ovulatory phase. There were also significant increases in appetite, craving for chocolate, craving for sweets in general, craving for salty flavor, and total craving score during the late luteal phase compared to the menstrual, follicular, and ovulatory phases (P < 0.001). Conclusions Our findings suggest an increased intake of protein, and specifically animal protein, as well as an increase in reported food cravings, during the luteal phase of the menstrual cycle independent of ovulatory status. These results highlight a plausible link between macronutrient intake and menstrual cycle phase.
Lower and higher AMH values were not associated with fecundability in unassisted conceptions in a cohort of fecund women with a history of one or two prior losses. Our data do not support routine AMH testing for preconception counseling in young, fecund women.
IMPORTANCE Nausea and vomiting during pregnancy have been associated with a reduced risk for pregnancy loss. However, most prior studies enrolled women with clinically recognized pregnancies, thereby missing early losses.OBJECTIVE To examine the association of nausea and vomiting during pregnancy with pregnancy loss. DESIGN, SETTING, AND PARTICIPANTSA randomized clinical trial, Effects of Aspirin in Gestation and Reproduction, enrolled women with 1 or 2 prior pregnancy losses at 4 US clinical centers from June 15, 2007, to July 15, 2011. This secondary analysis was limited to women with a pregnancy confirmed by positive results of a human chorionic gonadotropin (hCG) test. Nausea symptoms were ascertained from daily preconception and pregnancy diaries for gestational weeks 2 to 8. From weeks 12 to 36, participants completed monthly questionnaires summarizing symptoms for the preceding 4 weeks. A week-level variable included nausea only, nausea with vomiting, or neither. MAIN OUTCOMES AND MEASURESPeri-implantation (hCG-detected pregnancy without ultrasonographic evidence) and clinically recognized pregnancy losses.RESULTS A total of 797 women (mean [SD] age, 28.7 [4.6] years) had an hCG-confirmed pregnancy. Of these, 188 pregnancies (23.6%) ended in loss. At gestational week 2, 73 of 409 women (17.8%) reported nausea without vomiting and 11 of 409 women (2.7%), nausea with vomiting. By week 8, the proportions increased to 254 of 443 women (57.3%) and 118 of 443 women (26.6%), respectively. Hazard ratios (HRs) for nausea (0.50; 95% CI, 0.32-0.80) and nausea with vomiting (0.25; 95% CI, 0.12-0.51) were inversely associated with pregnancy loss. The associations of nausea (HR, 0.59; 95% CI, 0.29-1.20) and nausea with vomiting (HR, 0.51; 95% CI, 0.11-2.25) were similar for peri-implantation losses but were not statistically significant. Nausea (HR, 0.44; 95% CI, 0.26-0.74) and nausea with vomiting (HR, 0.20; 95% CI, 0.09-0.44) were associated with a reduced risk for clinical pregnancy loss.CONCLUSIONS AND RELEVANCE Among women with 1 or 2 prior pregnancy losses, nausea and vomiting were common very early in pregnancy and were associated with a reduced risk for pregnancy loss. These findings overcome prior analytic and design limitations and represent the most definitive data available to date indicating the protective association of nausea and vomiting in early pregnancy and the risk for pregnancy loss. TRIAL REGISTRATION clinicaltrials.gov Identifier: NCT00467363
Context:Inflammation is linked to causes of infertility. Low-dose aspirin (LDA) may improve reproductive success in women with chronic, low-grade inflammation.Objective:To investigate the effect of preconception-initiated LDA on pregnancy rate, pregnancy loss, live birth rate, and inflammation during pregnancy.Design:Stratified secondary analysis of a multicenter, block-randomized, double-blind, placebo-controlled trial.Setting:Four US academic medical centers, 2007 to 2012.Participants:Healthy women aged 18 to 40 years (N = 1228) with one to two prior pregnancy losses actively attempting to conceive.Intervention:Preconception-initiated, daily LDA (81 mg) or matching placebo taken up to six menstrual cycles attempting pregnancy and through 36 weeks’ gestation in women who conceived.Main Outcome Measures:Confirmed pregnancy, live birth, and pregnancy loss were compared between LDA and placebo, stratified by tertile of preconception, preintervention serum high-sensitivity C-reactive protein (hsCRP) (low, <0.70 mg/L; middle, 0.70 to <1.95 mg/L; high, ≥1.95 mg/L).Results:Live birth occurred in 55% of women overall. The lowest pregnancy and live birth rates occurred among the highest hsCRP tertile receiving placebo (44% live birth). LDA increased live birth among high-hsCRP women to 59% (relative risk, 1.35; 95% confidence interval, 1.08 to 1.67), similar to rates in the lower and mid-CRP tertiles. LDA did not affect clinical pregnancy or live birth in the low (live birth: 59% LDA, 54% placebo) or midlevel hsCRP tertiles (live birth: 59% LDA, 59% placebo).Conclusions:In women attempting conception with elevated hsCRP and prior pregnancy loss, LDA may increase clinical pregnancy and live birth rates compared with women without inflammation and reduce hsCRP elevation during pregnancy.
Summary Epidemiological studies involving biomarkers are often hindered by prohibitively expensive laboratory tests. Strategically pooling specimens prior to performing these lab assays has been shown to effectively reduce cost with minimal information loss in a logistic regression setting. When the goal is to perform regression with a continuous biomarker as the outcome, regression analysis of pooled specimens may not be straightforward, particularly if the outcome is right-skewed. In such cases, we demonstrate that a slight modification of a standard multiple linear regression model for poolwise data can provide valid and precise coefficient estimates when pools are formed by combining biospecimens from subjects with identical covariate values. When these x-homogeneous pools cannot be formed, we propose a Monte Carlo Expectation Maximization (MCEM) algorithm to compute maximum likelihood estimates (MLEs). Simulation studies demonstrate that these analytical methods provide essentially unbiased estimates of coefficient parameters as well as their standard errors when appropriate assumptions are met. Furthermore, we show how one can utilize the fully observed covariate data to inform the pooling strategy, yielding a high level of statistical efficiency at a fraction of the total lab cost.
Background Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are studied. Our objective is to demonstrate how highly correlated data arise in epidemiologic research and provide guidance on how to proceed analytically when faced with highly correlated data utilizing a directed acyclic graph approach. Methods We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. Results For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models it increased to a lesser extent or decreased. Conclusion Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.