The smoothly clipped absolute deviation (SCAD) estimator, proposed by Fan and Li, has many desirable properties, including continuity, sparsity, and unbiasedness. The SCAD estimator also has the (asymptotically) oracle property when the dimension of covariates is fixed or diverges more slowly than the sample size. In this article we study the SCAD estimator in high-dimensional settings where the dimension of covariates can be much larger than the sample size. First, we develop an efficient optimization algorithm that is fast and always converges to a local minimum. Second, we prove that the SCAD estimator still has the oracle property on high-dimensional problems. We perform numerical studies to compare the SCAD estimator with the LASSO and SIS-SCAD estimators in terms of prediction accuracy and variable selectivity when the true model is sparse. Through the simulation, we show that the variance estimator of Fan and Li still works well for some limited high-dimensional cases where the true nonzero coefficients are not too small and the sample size is moderately large. We apply the proposed algorithm to analyze a high-dimensional microarray data set.
T2 weighted MR Image analysis of the paravertebral back muscles in patients with degenerative lumbar flat back showed significant fat infiltration compared with those in the normal control using digital image analysis. Digital image analysis of the paravertebral back muscles is a useful tool for measuring the degree of paravertebral back muscle degeneration.
We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.