Kai Zhang scite author profile

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference'' by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing ``simultaneity insurance'' for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffe protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1077 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

Effectiveness and efficiency of a CAD/CAM orthodontic bracket system

Brown¹,

Koroluk

et al. 2015

American Journal of Orthodontics and Dentofacial Orthopedics

View full text Add to dashboard Cite

The CAD/CAM orthodontic bracket system evaluated in this study was as effective in treatment outcome measures as were standard brackets bonded both directly and indirectly. The CAD/CAM appliance was more efficient in regard to treatment duration, although the decrease in total archwire appointments was minimal. Further investigation is needed to better quantify the clinical benefits of CAD/CAM orthodontic appliances.

show abstract

BET on Independence

Zhang

2019

Journal of the American Statistical Association

View full text Add to dashboard Cite

We study the problem of nonparametric dependence detection. Many existing methods may suffer severe power loss due to non-uniform consistency, which we illustrate with a paradox. To avoid such power loss, we approach the nonparametric test of independence through the new framework of binary expansion statistics (BEStat) and binary expansion testing (BET), which examine dependence through a novel binary expansion filtration approximation of the copula. Through a Hadamard transform, we find that the symmetry statistics in the filtration are complete sufficient statistics for dependence. These statistics are also uncorrelated under the null. By utilizing symmetry statistics, the BET avoids the problem of non-uniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for reliable power and (b) by providing clear interpretations of global relationships upon rejection of independence. The binary expansion approach also connects the symmetry statistics with the current computing system to facilitate efficient bitwise implementation. We illustrate the BET with a study of the distribution of stars in the night sky and with an exploratory data analysis of the TCGA breast cancer data.

show abstract

Using Split Samples and Evidence Factors in an Observational Study of Neonatal Outcomes

Zhang

Small

Lorch

et al. 2011

Journal of the American Statistical Association

View full text Add to dashboard Cite

During a few years around the turn of the millennium, a series of local hospitals in Philadelphia closed their obstetrics units, with the consequence that many mothers-to-be arrived unexpectedly at the city's large, regional teaching hospitals whose obstetrics units remained open. Nothing comparable happened in other United States cities, where there were only sporadic changes in the availability of obstetrics units. What effect did these closures have on mothers and their newborns? We study this question by comparing Philadelphia before and after the closures to a control Philadelphia constructed from elsewhere in Pennsylvania, California, and Missouri, matching mothers for 59 observed covariates including year of birth. The analysis focuses on the period 1995-1996, when there were no closures, and the period 1997-1999 when five hospitals abruptly closed their obstetrics units. Using a new sensitivity analysis for difference-in-differences with binary outcomes, we examine the possibility that Philadelphia mothers differed from control mothers in terms of some covariate not measured, and perhaps the distribution of that unobserved covariate changed in a different way in Philadelphia and control-Philadelphia in the years before and after the closures. We illustrate two recently proposed techniques for the design and analysis of observational studies, namely split samples and evidence factors. To boost insensitivity to unmeasured bias, we drew a small random planning sample of about 26,000 mothers in 13,000 pairs and used them to frame hypotheses that promised to be less sensitive to bias; then these hypotheses were tested on the large, independent complementary analysis sample of nearly 240,000 mothers in 120,000 pairs. The splitting was successful twice over: (i) it successfully identified an interesting and moderately insensitive conclusion, (ii) by comparison of the planning and analysis samples, it is clearly seen to have avoided a exaggerated claim of insensitivity to unmeasured bias that might have occurred by focusing on the least sensitive of many findings. Also, we identified two approximate evidence factors and one test for unmeasured bias: (i) factor 1 compared Philadelphia to control before and after the closures, (ii) factor 2 focused on the years 1997-1999 of abrupt closures and compared zip codes with closures to zip codes without closures, (iii) and the test for bias focused on the years 1995-1996 prior to closures and compared zip codes which would have closures in 1997-1999 to zip codes without closures in 1997-1999-any ostensible effect found in that last comparison is surely bias from the characteristics of Philadelphia zip codes in which closures took place. Approximate evidence factors provide nearly independent tests of a null hypothesis such that the evidence in each factor would be unaffected by certain biases that would invalidate the other factor.Abstract. During a few years around the turn of the millennium, a series of local hospitals in Philadelphia closed their obstetrics units, with...

show abstract

Confidence Intervals and Regions for the Lasso by Using Stochastic Variational Inequality Techniques in Optimization

Liu

Yin

et al. 2016

View full text Add to dashboard Cite

Sparse regression techniques have been popular in recent years because of their ability in handling high dimensional data with built-in variable selection.The lasso is perhaps one of the most well-known examples. Despite intensive work in this direction, how to provide valid inference for sparse regularized methods remains a challenging statistical problem. We take a unique point of view of this problem and propose to make use of stochastic variational inequality techniques in optimization to derive confidence intervals and regions for the lasso. Some theoretical properties of the procedure are obtained. Both simulated and real data examples are used to demonstrate the performance of the method.

show abstract

JIVE integration of imaging and behavioral data

Risk

Zhang

et al. 2017

NeuroImage

View full text Add to dashboard Cite

A major goal in neuroscience is to understand the neural pathways underlying human behavior. We introduce the recently developed Joint and Individual Variation Explained (JIVE) method to the neuroscience community to simultaneously analyze imaging and behavioral data from the Human Connectome Project. Motivated by recent computational and theoretical improvements in the JIVE approach, we simultaneously explore the joint and individual variation between and within imaging and behavioral data. In particular, we demonstrate that JIVE is an effective and efficient approach for integrating task fMRI and behavioral variables using three examples: one example where task variation is strong, one where task variation is weak and a reference case where the behavior is not directly related to the image. These examples are provided to visualize the different levels of signal found in the joint variation including working memory regions in the image data and accuracy and response time from the in-task behavioral variables. Joint analysis provides insights not available from conventional single block decomposition methods such as Singular Value Decomposition. Additionally, the joint variation estimated by JIVE appears to more clearly identify the working memory regions than Partial Least Squares (PLS), while Canonical Correlation Analysis (CCA) gives grossly overfit results. The individual variation in JIVE captures the behavior unrelated signals such as a background activation that is spatially homogeneous and activation in the default mode network. The information revealed by this individual variation is not examined in traditional methods such as CCA and PLS. We suggest that JIVE can be used as an alternative to PLS and CCA to improve estimation of the signal common to two or more datasets and reveal novel insights into the signal unique to each dataset.

show abstract

Limiting distribution of decoherent quantum random walks

Zhang

2008

Phys. Rev. A

View full text Add to dashboard Cite

The behaviors of one-dimensional quantum random walks are strikingly different from those of classical ones. However, when decoherence is involved, the limiting distributions take on many classical features over time. In this paper, we study the decoherence on both position and "coin" spaces of the particle. We propose a new analytical approach to investigate these phenomena and obtain the generating functions which encode all the features of these walks. Specifically, from these generating functions, we find exact analytic expressions of several moments for the time and noise dependence of position. Moreover, the limiting position distributions of decoherent quantum random walks are shown to be Gaussian in an analytical manner. These results explicitly describe the relationship between the system and the level of decoherence.

show abstract

Comment: The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation

Zhang¹,

Small

2009

Statist. Sci.

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kai Zhang

Valid post-selection inference

Effectiveness and efficiency of a CAD/CAM orthodontic bracket system

BET on Independence

Using Split Samples and Evidence Factors in an Observational Study of Neonatal Outcomes

Confidence Intervals and Regions for the Lasso by Using Stochastic Variational Inequality Techniques in Optimization

JIVE integration of imaging and behavioral data

Limiting distribution of decoherent quantum random walks

Comment: The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation

Contact Info

Product

Resources

About