Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety.
Characterization of relationships between time-varying drug exposures and adverse events (AEs) related to health outcomes represents the primary objective in postmarketing drug safety surveillance. Such surveillance increasingly utilizes large-scale longitudinal observational databases (LODs), containing time-stamped patient-level medical information including periods of drug exposure and dates of diagnoses for millions of patients. Statistical methods for LODs must confront computational challenges related to the scale of the data, and must also address confounding and other biases that can undermine efforts to estimate effect sizes. Methods that compare on-drug with off-drug periods within patient offer specific advantages over between patient analysis on both counts. To accomplish these aims, we extend the self-controlled case series (SCCS) for LODs. SCCS implicitly controls for fixed multiplicative baseline covariates since each individual acts as their own control. In addition, only exposed cases are required for the analysis, which is computationally advantageous. The standard SCCS approach is usually used to assess single drugs and therefore estimates marginal associations between individual drugs and particular AEs. Such analyses ignore confounding drugs and interactions and have the potential to give misleading results. In order to avoid these difficulties, we propose a regularized multiple SCCS approach that incorporates potentially thousands or more of time-varying confounders such as other drugs. The approach successfully handles the high dimensionality and can provide a sparse solution via an L₁ regularizer. We present details of the model and the associated optimization procedure, as well as results of empirical investigations.
The SCCS recently extended to apply a multivariate adjustment for concomitant drug use offers promise as a statistical tool for risk identification in large-scale observational healthcare databases. Poor estimator calibration dampens enthusiasm, but on-going work should correct this short-coming.
Regulators such as the U.S. Food and Drug Administration have elaborate, multi-year processes for approving new drugs as safe and effective. Nonetheless, in recent years, several approved drugs have been withdrawn from the market because of serious and sometimes fatal side effects. We describe statistical methods for post-approval data analysis that attempt to detect drug safety problems as quickly as possible. Bayesian approaches are especially useful because of the high dimensionality of the data, and, in the future, for incorporating disparate sources of information.
A primary objective in the application of postmarketing drug safety surveillance is to ascertain the relationship between time-varying drug exposures and recurrent adverse events (AEs) related to health outcomes. The self-controlled case series (SCCS) method is one approach to analysis in this context. It is based on a conditional Poisson regression model, which assumes that events at different time points are conditionally independent given the covariate process. This requirement is problematic when the occurrence of an event can alter the future event risk. In a clinical setting, for example, patients who have a first myocardial infarction (MI) may be at higher subsequent risk for a second. In this work, we propose the positive dependence self-controlled case series (PD-SCCS) method: a generalization of SCCS that allows the occurrence of an event to increase the future event risk, yet maintains the advantages of the original model by controlling for fixed baseline covariates and relying solely on data from cases. As in the SCCS model, individual-level baseline parameters drop out of the PD-SCCS likelihood. Data sources used for postmarketing surveillance can contain tens of millions of people, so in this situation it is particularly advantageous that PD-SCCS avoids doing a costly estimation of individual parameters. We develop expressions for large sample inference and optimization for PD-SCCS and compare the results of our generalized model with the more restrictive SCCS approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.