Despite the growing popularity of propensity score (PS) methods in epidemiology, relatively little has been written in the epidemiologic literature about the problem of variable selection for PS models. The authors present the results of two simulation studies designed to help epidemiologists gain insight into the variable selection problem in a PS analysis. The simulation studies illustrate how the choice of variables that are included in a PS model can affect the bias, variance, and mean squared error of an estimated exposure effect. The results suggest that variables that are unrelated to the exposure but related to the outcome should always be included in a PS model. The inclusion of these variables will decrease the variance of an estimated exposure effect without increasing bias. In contrast, including variables that are related to the exposure but not to the outcome will increase the variance of the estimated exposure effect without decreasing bias. In very small studies, the inclusion of variables that are strongly related to the exposure but only weakly related to the outcome can be detrimental to an estimate in a mean squared error sense. The addition of these variables removes only a small amount of bias but can increase the variance of the estimated exposure effect. These simulation studies and other analytical results suggest that standard model-building tools designed to create good predictive models of the exposure will not always lead to optimal PS models, particularly in small studies.
Background
Adjusting for large numbers of covariates ascertained from patients’ health care claims data may improve control of confounding, as these variables may collectively be proxies for unobserved factors. Here we develop and test an algorithm that empirically identifies candidate covariates, prioritizes covariates, and integrates them into a propensity-score-based confounder adjustment model.
Methods
We developed a multi-step algorithm to implement high-dimensional proxy adjustment in claims data. Steps include (1) identifying data dimensions, e.g. diagnoses, procedures, and medications, (2) empirically identifying candidate covariates, (3) assess recurrence of codes, (4) prioritizing covariates, (5) selecting covariates for adjustment, (6) estimating the exposure propensity score, and (7) estimating an outcome model. This algorithm was tested in Medicare claims data, including a study on the effect of Cox-2 inhibitors on reduced gastric toxicity compared to nonselective nonsteroidal anti-inflammatory drugs (NSAIDs).
Results
In a population of 49,653 new users of Cox-2 inhibitors or nonselective NSAIDs, a crude relative risk (RR) for upper GI toxicity (RR = 1.09 [95% confidence interval = 0.91–1.30]) was initially observed. Adjusting for 15 predefined covariates resulted in a possible gastroprotective effect (0.94[0.78–1.12]). A gastroprotective effect became stronger when adjusting for an additional 500 algorithm-derived covariates (0.88[0.73–1.06]). Results of a study on the effect of statin on reduced mortality were similar. Using the algorithm adjustment confirmed a null finding between influenza vaccination and hip fracture (1.02[0.85–1.21]).
Conclusion
In typical pharmacoepidemiologic studies, the proposed high-dimensional propensity score resulted in improved effect estimates compared with adjustment limited to predefined covariates, when benchmarked against results expected from randomized trials.
OBJECTIVE-To develop and validate a single numeric comorbidity score for predicting shortand long-term mortality, by combining conditions in the Charlson and Elixhauser measures.STUDY DESIGN AND SETTING-In a cohort of 120,679 Pennsylvania Medicare enrollees with drug coverage through a pharmacy assistance program, we developed a single numeric comorbidity score for predicting 1-year mortality, by combining the conditions in the Charlson and Elixhauser measures. We externally validated the combined score in a cohort of New Jersey Medicare enrollees, by comparing its performance to that of both component scores in predicting 1-year mortality, as well as 180-, 90-, and 30-day mortality. CONCLUSION-In similar populations and data settings, the combined score may offer improvements in comorbidity summarization over existing scores.
RESULTS-C-statistics
If confirmed, these results suggest that conventional antipsychotic medications are at least as likely as atypical agents to increase the risk of death among elderly persons and that conventional drugs should not be used to replace atypical agents discontinued in response to the FDA warning.
The mean level of adherence to tamoxifen is high compared with other chronic medications. However, nearly one fourth of patients may be at risk for inadequate clinical response because of poor adherence. Because of the efficacy of tamoxifen therapy in preventing recurrence and death in women with early-stage breast cancer, further efforts are necessary to identify and prevent suboptimal adherence.
Comorbidity is an important confounder in epidemiologic studies. The authors compared the predictive performance of comorbidity scores for use in epidemiologic research with administrative databases. Study participants were British Columbia, Canada, residents aged >or=65 years who received angiotensin-converting enzyme inhibitors or calcium channel blockers at least once during the observation period. Six scores were computed for all 141,161 participants during the baseline year (1995-1996). Endpoints were death and health care utilization during a 12-month follow-up (1996-1997). Performance was measured by using the c statistic ranging from 0.5 for chance prediction of outcome to 1.0 for perfect prediction. In logistic regression models controlling for age and gender, four scores based on the International Classification of Diseases, Ninth Revision (ICD-9) generally performed better at predicting 1-year mortality (c = 0.771, c = 0.768, c = 0.745, c = 0.745) than medication-based Chronic Disease Score (CDS)-1 and CDS-2 (c = 0.738, c = 0.718). Number of distinct medications used was the best predictor of future physician visits (R(2) = 0.121) and expenditures (R(2) = 0.128) and a good predictor of mortality (c = 0.745). Combining ICD-9 and medication-based scores improved the c statistics (1.7% and 6.2%, respectively) for predicting mortality. Generalizability of results may be limited to an elderly, predominantly White population with equal access to state-funded health care.
Publication of results based on propensity score methods has increased dramatically, but there is little evidence that these methods yield substantially different estimates compared with conventional multivariable methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.