The most popular regression model for the analysis of time-to-event data is the Cox proportional hazards model. While the model specifies a parametric relationship between the hazard function and the predictor variables, there is no specification regarding the form of the baseline hazard function. A critical assumption of the Cox model, however, is the proportional hazards assumption: when the predictor variables do not vary over time, the hazard ratio comparing any two observations is constant with respect to time. Therefore, to perform credible estimation and inference, one must first assess whether the proportional hazards assumption is reasonable. As with other regression techniques, it is also essential to examine whether appropriate functional forms of the predictor variables have been used, and whether there are any outlying or influential observations. This article reviews diagnostic methods for assessing goodness-of-fit for the Cox proportional hazards model. We illustrate these methods with a case-study using available R functions, and provide complete R code for a simulated example as a supplement.
The geographically weighted regression (GWR) is a well-known statistical approach to explore spatial non-stationarity of the regression relationship in spatial data analysis. In this paper, we discuss a Bayesian recourse of GWR. Bayesian variable selection based on spike-and-slab prior, bandwidth selection based on range prior, and model assessment using a modified deviance information criterion and a modified logarithm of pseudo-marginal likelihood are fully discussed in this paper. Usage of the graph distance in modeling areal data is also introduced. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods with both small and large number of location scenarios, and comparison with the classical frequentist GWR is made. The performance of variable selection and estimation of the proposed methodology under different circumstances are satisfactory. We further apply the proposed methodology in analysis of a province-level macroeconomic data of thirty selected provinces in China. The estimation and variable selection results reveal insights about China’s economy that are convincing and agree with previous studies and facts.
The Cox proportional hazard model is one of the most popular tools in analyzing time-to-event data in public health studies. When outcomes observed in clinical data from different regions yield a varying pattern correlated with location, it is often of great interest to investigate spatially varying effects of covariates. In this paper, we propose a geographically weighted Cox regression model for sparse spatial survival data.In addition, a stochastic neighborhood weighting scheme is introduced at the county level. Theoretical properties of the proposed geographically weighted estimators are examined in detail. A model selection scheme based on the Takeuchi's model robust information criteria (TIC) is discussed. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods. We further apply the proposed methodology to analyze real data on prostate cancer from the Surveillance, Epidemiology, and End Results cancer registry for the state of Louisiana.
The Cox model—which remains the first choice for analyzing time‐to‐event data, even for large data sets—relies on the proportional hazards (PH) assumption. When survival data arrive sequentially in chunks, a fast and minimally storage intensive approach to test the PH assumption is desirable. We propose an online updating approach that updates the standard test statistic as each new block of data becomes available and greatly lightens the computational burden. Under the null hypothesis of PH, the proposed statistic is shown to have the same asymptotic distribution as the standard version computed on an entire data stream with the data blocks pooled into one data set. In simulation studies, the test and its variant based on most recent data blocks maintain their sizes when the PH assumption holds and have substantial power to detect different violations of the PH assumption. We also show in simulation that our approach can be used successfully with “big data” that exceed a single computer's computational resources. The approach is illustrated with the survival analysis of patients with lymphoma cancer from the Surveillance, Epidemiology, and End Results Program. The proposed test promptly identified deviation from the PH assumption, which was not captured by the test based on the entire data.
In this paper, we develop a group learning approach to analyze the underlying heterogeneity structure of shot selection among professional basketball players in the NBA. We propose a mixture of finite mixtures (MFM) model to capture the heterogeneity of shot selection among different players based on the Log Gaussian Cox process (LGCP). Our proposed method can simultaneously estimate the number of groups and group configurations. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for our proposed model. Simulation studies have been conducted to demonstrate its performance. Finally, our proposed learning approach is further illustrated in analyzing shot charts of selected players in the NBA's 2017-2018 regular season. KEYWORDSbasketball shot charts, heterogeneity pursuit, log gaussian cox process, mixture of finite mixtures, nonparameteric bayesian INTRODUCTIONIn basketball data analytics, one primary problem of research interest is to study how players choose the locations to make shots. Shot charts, which are graphical representations of players' shot location selections, provide important summary of information for basketball coaches as well as teams' data analysts, as no good defense strategies can be made without understanding the shot selection habits of players in the rival teams.Shot selection data have been discussed from different statistical perspectives. Reich, Hodges, Carlin, and Reich (2006) developed a spatially varying coefficients model for shot-chart data, where the court is divided into small regions and the probability of making a shot in these zones is modeled using the multinomial logit approach. Recognizing the random nature of shot location selection, Miller et al. (2014) analyzed the underlying spatial structure among professional basketball players based on spatial point processes. Franks, Miller, Bornn, and Goldsberry (2015) combined spatial and spatio-temporal processes, matrix factorization techniques, and hierarchical regression models for characterizing the spatial structure of locations for shot attempts. In spatial point processes, locations for points are assumed random and are regarded as realizations of a process governed by an underlying intensity. Spatial point processes are well discussed in many statistical literatures, such as the Poisson process (Geyer, 1998), the Gibbs process (Goulard, Särkkä, & Grabarnik, 1996), and the Log Gaussian Cox process (LGCP Møller, Syversveen, & Waagepetersen, 1998). In addition, they have been applied to different areas, such as ecological studies (Jiao, Hu, & Yan, 2020;Thurman, Fu,
Objective To improve on existing methods to infer race/ethnicity in health care data through an analysis of birth records from Connecticut. Data Source A total of 162 467 Connecticut birth records from 2009 to 2013. Study Design We developed a logistic model to predict race/ethnicity using data from US Census and patient‐level information. Model performance was tested and compared to previous studies. Five performance measures were used for comparison. Principal Findings Our full model correctly classifies 81 percent of subjects and shows improvement over extant methods. We achieved substantially improved sensitivity in predicting black race. Conclusions Predictive models using Census information and patients’ demographic characteristics can be used to accurately populate race/ethnicity information in health care databases, enhancing opportunities to investigate and address disparities in access to, utilization of, and outcomes of care.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.