2022
DOI: 10.1002/cem.3393
|View full text |Cite
|
Sign up to set email alerts
|

Liu regression after random forest for prediction and modeling in high dimension

Abstract: In the modern era, using advanced technology, we have access to data with many features, and therefore, feature engineering has become a vital task in data analysis. One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of features ( p) exceeds the number of samples ()n. We propose a novel, yet simple, strategy to estimate the regression parameters in a high‐dimensional regime in the presence of multicollinearity. The proposed approach enjo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(12 citation statements)
references
References 39 publications
(32 reference statements)
1
11
0
Order By: Relevance
“…The goodness of fit function (goof) was used for cross‐validation to obtain the determination coefficient ( R 2 ), consistency coefficient (CC), mean error (ME), and root mean square error (RMSE) as evaluation indicators for model prediction. The formulae are as follows (Arashi et al., 2022; Lindner et al., 2015): R2=i=1Np(xifalse)0.33emp¯2/normali0.33em=1Nv(xifalse)0.33emp¯2$$\begin{equation*}{R}^{\mathrm{2}}{\mathrm{\ = \ }}\mathop \sum \limits_{i{\mathrm{\ = \ 1}}}^N {\left\{ {p{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}{\mathrm{/}}\mathop \sum \limits_{{\mathrm{i}}\ {\mathrm{ = \ 1}}}^N {\left\{ {v{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}\end{equation*}$$CC=2·R·σv·σpσp2+σv2+v¯p¯2$$\begin{equation*}{\mathrm{CC\ = \ }}\frac{{{\mathrm{2}} \cdot R \cdot {{{\sigma}}}_v \cdot {{{\sigma}}}_p}}{{{{{\sigma}}}_p^{\mathrm{2}}{\mathrm{\ + \ }}{{{\sigma}}}_v^{\mathrm{2}}{\mathrm{\ + \ }}{{\left( {\bar{v}\ - \ \bar{p}} \right)}}^{\mathrm{2}}}}\end{equation*}$$ME=1Ni=1N{}pfalse(xi)goodbreak−0.33emv<...…”
Section: Methodsmentioning
confidence: 99%
“…The goodness of fit function (goof) was used for cross‐validation to obtain the determination coefficient ( R 2 ), consistency coefficient (CC), mean error (ME), and root mean square error (RMSE) as evaluation indicators for model prediction. The formulae are as follows (Arashi et al., 2022; Lindner et al., 2015): R2=i=1Np(xifalse)0.33emp¯2/normali0.33em=1Nv(xifalse)0.33emp¯2$$\begin{equation*}{R}^{\mathrm{2}}{\mathrm{\ = \ }}\mathop \sum \limits_{i{\mathrm{\ = \ 1}}}^N {\left\{ {p{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}{\mathrm{/}}\mathop \sum \limits_{{\mathrm{i}}\ {\mathrm{ = \ 1}}}^N {\left\{ {v{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}\end{equation*}$$CC=2·R·σv·σpσp2+σv2+v¯p¯2$$\begin{equation*}{\mathrm{CC\ = \ }}\frac{{{\mathrm{2}} \cdot R \cdot {{{\sigma}}}_v \cdot {{{\sigma}}}_p}}{{{{{\sigma}}}_p^{\mathrm{2}}{\mathrm{\ + \ }}{{{\sigma}}}_v^{\mathrm{2}}{\mathrm{\ + \ }}{{\left( {\bar{v}\ - \ \bar{p}} \right)}}^{\mathrm{2}}}}\end{equation*}$$ME=1Ni=1N{}pfalse(xi)goodbreak−0.33emv<...…”
Section: Methodsmentioning
confidence: 99%
“…[32][33][34][35] Random forests can be used to naturally rank the importance of variables in a regression or classification problem, and it is implemented in the R package randomForest or python library Scikit-learn (Sklearn). 36,37 Recently, Arashi et al 21 selected important variables with the use of random forests. The highdimensional data were reduced to low-dimensional data, and finally, the low-dimensional data were estimated using the Liu estimator.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…where the response variable y i is a n  1 vector, the predictors x i ℝ nÂp , i ¼ 1, …, n, is a known n  p matrix of predictors such that p > n, the regression coefficient β is a p  1 vector, and ϵ i , is the random error vector assumed to be normally distributed with mean 0 and variance σ 2 ℝ þ , i ¼ 1, …, n: In low dimensional settings when p < n, β is popularly estimated using the least squares estimator (OLS). OLS minimizes y À Xβ k k 2 2 subject to an L2 norm with respect to β but fails to give a unique estimate in high dimensional settings when p > n. 21 Another threat to the performance of OLS is multicollinearity, which surfaces as a result of the correlation or linear dependency among the predictors. [22][23][24][25][26][27] Biased estimators such as the ridge regression estimator, 28 the Liu estimator, 29 modified ridge-type estimator, 30 the Kibria-Lukman (KL) estimator, 31 robust principle component (PC)-ridge estimator, 24 JKL estimator, 22 and others were developed to account for multicollinearity problem in linear regression models.…”
Section: Theoretical Backgroundmentioning
confidence: 99%
See 2 more Smart Citations