Yishu Xue scite author profile

The most popular regression model for the analysis of time-to-event data is the Cox proportional hazards model. While the model specifies a parametric relationship between the hazard function and the predictor variables, there is no specification regarding the form of the baseline hazard function. A critical assumption of the Cox model, however, is the proportional hazards assumption: when the predictor variables do not vary over time, the hazard ratio comparing any two observations is constant with respect to time. Therefore, to perform credible estimation and inference, one must first assess whether the proportional hazards assumption is reasonable. As with other regression techniques, it is also essential to examine whether appropriate functional forms of the predictor variables have been used, and whether there are any outlying or influential observations. This article reviews diagnostic methods for assessing goodness-of-fit for the Cox proportional hazards model. We illustrate these methods with a case-study using available R functions, and provide complete R code for a simulated example as a supplement.

show abstract

Geographically Weighted Regression Analysis for Spatial Economics Data: A Bayesian Recourse

Xue

2020

International Regional Science Review

View full text Add to dashboard Cite

The geographically weighted regression (GWR) is a well-known statistical approach to explore spatial non-stationarity of the regression relationship in spatial data analysis. In this paper, we discuss a Bayesian recourse of GWR. Bayesian variable selection based on spike-and-slab prior, bandwidth selection based on range prior, and model assessment using a modified deviance information criterion and a modified logarithm of pseudo-marginal likelihood are fully discussed in this paper. Usage of the graph distance in modeling areal data is also introduced. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods with both small and large number of location scenarios, and comparison with the classical frequentist GWR is made. The performance of variable selection and estimation of the proposed methodology under different circumstances are satisfactory. We further apply the proposed methodology in analysis of a province-level macroeconomic data of thirty selected provinces in China. The estimation and variable selection results reveal insights about China’s economy that are convincing and agree with previous studies and facts.

show abstract

Geographically Weighted Cox Regression for Prostate Cancer Survival Data in Louisiana

Xue

Schifano

2019

Geographical Analysis

View full text Add to dashboard Cite

The Cox proportional hazard model is one of the most popular tools in analyzing time-to-event data in public health studies. When outcomes observed in clinical data from different regions yield a varying pattern correlated with location, it is often of great interest to investigate spatially varying effects of covariates. In this paper, we propose a geographically weighted Cox regression model for sparse spatial survival data.In addition, a stochastic neighborhood weighting scheme is introduced at the county level. Theoretical properties of the proposed geographically weighted estimators are examined in detail. A model selection scheme based on the Takeuchi's model robust information criteria (TIC) is discussed. Extensive simulation studies are carried out to examine the empirical performance of the proposed methods. We further apply the proposed methodology to analyze real data on prostate cancer from the Surveillance, Epidemiology, and End Results cancer registry for the state of Louisiana.

show abstract

Heterogeneous regression models for clusters of spatial dependent data

Xue

2020

Spatial Economic Analysis

View full text Add to dashboard Cite

An online updating approach for testing the proportional hazards assumption with streams of survival data

et al. 2019

View full text Add to dashboard Cite

The Cox model—which remains the first choice for analyzing time‐to‐event data, even for large data sets—relies on the proportional hazards (PH) assumption. When survival data arrive sequentially in chunks, a fast and minimally storage intensive approach to test the PH assumption is desirable. We propose an online updating approach that updates the standard test statistic as each new block of data becomes available and greatly lightens the computational burden. Under the null hypothesis of PH, the proposed statistic is shown to have the same asymptotic distribution as the standard version computed on an entire data stream with the data blocks pooled into one data set. In simulation studies, the test and its variant based on most recent data blocks maintain their sizes when the PH assumption holds and have substantial power to detect different violations of the PH assumption. We also show in simulation that our approach can be used successfully with “big data” that exceed a single computer's computational resources. The approach is illustrated with the survival analysis of patients with lymphoma cancer from the Surveillance, Epidemiology, and End Results Program. The proposed test promptly identified deviation from the PH assumption, which was not captured by the test based on the entire data.

show abstract

Bayesian group learning for shot selection of professional basketball players

Yang

Xue

2021

Stat

View full text Add to dashboard Cite

In this paper, we develop a group learning approach to analyze the underlying heterogeneity structure of shot selection among professional basketball players in the NBA. We propose a mixture of finite mixtures (MFM) model to capture the heterogeneity of shot selection among different players based on the Log Gaussian Cox process (LGCP). Our proposed method can simultaneously estimate the number of groups and group configurations. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for our proposed model. Simulation studies have been conducted to demonstrate its performance. Finally, our proposed learning approach is further illustrated in analyzing shot charts of selected players in the NBA's 2017-2018 regular season. KEYWORDSbasketball shot charts, heterogeneity pursuit, log gaussian cox process, mixture of finite mixtures, nonparameteric bayesian INTRODUCTIONIn basketball data analytics, one primary problem of research interest is to study how players choose the locations to make shots. Shot charts, which are graphical representations of players' shot location selections, provide important summary of information for basketball coaches as well as teams' data analysts, as no good defense strategies can be made without understanding the shot selection habits of players in the rival teams.Shot selection data have been discussed from different statistical perspectives. Reich, Hodges, Carlin, and Reich (2006) developed a spatially varying coefficients model for shot-chart data, where the court is divided into small regions and the probability of making a shot in these zones is modeled using the multinomial logit approach. Recognizing the random nature of shot location selection, Miller et al. (2014) analyzed the underlying spatial structure among professional basketball players based on spatial point processes. Franks, Miller, Bornn, and Goldsberry (2015) combined spatial and spatio-temporal processes, matrix factorization techniques, and hierarchical regression models for characterizing the spatial structure of locations for shot attempts. In spatial point processes, locations for points are assumed random and are regarded as realizations of a process governed by an underlying intensity. Spatial point processes are well discussed in many statistical literatures, such as the Poisson process (Geyer, 1998), the Gibbs process (Goulard, Särkkä, & Grabarnik, 1996), and the Log Gaussian Cox process (LGCP Møller, Syversveen, & Waagepetersen, 1998). In addition, they have been applied to different areas, such as ecological studies (Jiao, Hu, & Yan, 2020;Thurman, Fu,

show abstract

Imputing race and ethnic information in administrative health data

Xue

Harel

Aseltine

2019

Health Services Research

View full text Add to dashboard Cite

Objective To improve on existing methods to infer race/ethnicity in health care data through an analysis of birth records from Connecticut. Data Source A total of 162 467 Connecticut birth records from 2009 to 2013. Study Design We developed a logistic model to predict race/ethnicity using data from US Census and patient‐level information. Model performance was tested and compared to previous studies. Five performance measures were used for comparison. Principal Findings Our full model correctly classifies 81 percent of subjects and shows improvement over extant methods. We achieved substantially improved sensitivity in predicting black race. Conclusions Predictive models using Census information and patients’ demographic characteristics can be used to accurately populate race/ethnicity information in health care databases, enhancing opportunities to investigate and address disparities in access to, utilization of, and outcomes of care.

show abstract

A Visual Landscape Assessment Approach for High-density Urban Development

Tsou

Xue

et al.

View full text Add to dashboard Cite

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yishu Xue

Diagnostics for the Cox model

Geographically Weighted Regression Analysis for Spatial Economics Data: A Bayesian Recourse

Geographically Weighted Cox Regression for Prostate Cancer Survival Data in Louisiana

Heterogeneous regression models for clusters of spatial dependent data

An online updating approach for testing the proportional hazards assumption with streams of survival data

Bayesian group learning for shot selection of professional basketball players

Imputing race and ethnic information in administrative health data

A Visual Landscape Assessment Approach for High-density Urban Development

Contact Info

Product

Resources

About