With the advancement of technology, analysis of large-scale data of gene expression is feasible and has become very popular in the era of machine learning. This paper develops an improved ridge approach for the genome regression modeling. When multicollinearity exists in the data set with outliers, we consider a robust ridge estimator, namely the rank ridge regression estimator, for parameter estimation and prediction. On the other hand, the efficiency of the rank ridge regression estimator is highly dependent on the ridge parameter. In general, it is difficult to provide a satisfactory answer about the selection for the ridge parameter. Because of the good properties of generalized cross validation (GCV) and its simplicity, we use it to choose the optimum value of the ridge parameter. The GCV function creates a balance between the precision of the estimators and the bias caused by the ridge estimation. It behaves like an improved estimator of risk and can be used when the number of explanatory variables is larger than the sample size in high-dimensional problems. Finally, some numerical illustrations are given to support our findings.
In this paper, a generalized difference-based estimator is introduced for the vector parameter β in partially linear model when the errors are correlated. A generalized difference-based almost unbiased ridge estimator is defined for the vector parameter β. Under the linear stochastic constraint r = Rβ + e, a new generalized difference-based weighted mixed almost unbiased ridge estimator is proposed. The performance of this estimator over the generalized difference-based weighted mixed estimator, the generalized difference-based estimator, and the generalized differencebased almost unbiased ridge estimator in terms of the mean square error matrix criterion is investigated. Then, a method to select the biasing parameter k and nonstochastic weight ω is considered. The efficiency properties of the new estimator is illustrated by a simulation study. Finally, the performance of the new estimator is evaluated for a real dataset.
KeywordsDifference-based estimator • Generalized ridge estimator • Generalized difference-based weighted mixed almost unbiased ridge estimator • Partially linear model • Weighted mixed estimator
a b s t r a c tIn this paper, ridge and non-ridge type shrinkage estimators and their positive parts are defined in the semiparametric regression model when the errors are dependent and some non-stochastic linear restrictions are imposed under a multicollinearity setting. The exact risk expressions in addition to biases are derived for the estimators under study and the region of optimality of each estimator is exactly determined. Also, necessary and sufficient conditions, for the superiority of the ridge type estimator over its counterpart, for selecting the ridge parameter k are obtained. Lastly, a simulation study and real data analysis are performed to illustrate the efficiency of proposed estimators based on the minimum risk criterion. In this regard, kernel smoothing and modified cross-validation methods for estimating the non-parametric function are used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.