Liu regression after random forest for prediction and modeling in high dimension

Arashi, Mohammad; Lukman, Adewale F.; Algamal, Zakariya Yahya

doi:10.1002/cem.3393

Cited by 5 publications

(12 citation statements)

References 39 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The goodness of fit function (goof) was used for cross‐validation to obtain the determination coefficient ( R 2 ), consistency coefficient (CC), mean error (ME), and root mean square error (RMSE) as evaluation indicators for model prediction. The formulae are as follows (Arashi et al., 2022; Lindner et al., 2015):

\begin{equation*}{R}^{\mathrm{2}}{\mathrm{\ = \ }}\mathop \sum \limits_{i{\mathrm{\ = \ 1}}}^N {\left\{ {p{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}{\mathrm{/}}\mathop \sum \limits_{{\mathrm{i}}\ {\mathrm{ = \ 1}}}^N {\left\{ {v{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}\end{equation*}

\begin{equation*}{\mathrm{CC\ = \ }}\frac{{{\mathrm{2}} \cdot R \cdot {{{\sigma}}}_v \cdot {{{\sigma}}}_p}}{{{{{\sigma}}}_p^{\mathrm{2}}{\mathrm{\ + \ }}{{{\sigma}}}_v^{\mathrm{2}}{\mathrm{\ + \ }}{{\left( {\bar{v}\ - \ \bar{p}} \right)}}^{\mathrm{2}}}}\end{equation*}

ME = \frac{1}{N} \sum_{i = 1}^{N} ({}, p false(x_{i}) goodbreak- v <...)

…”

Section: Methodsmentioning

confidence: 99%

Structure and stability characteristics of zonal soil aggregates in the Three Rivers Source of the Qinghai‐Tibetan Plateau

Sun

Ping-an

Zhang

et al. 2023

Soil Science Soc of Amer J

View full text Add to dashboard Cite

This study aimed to explore the structure and stability characteristics of zonal soil aggregates in cold high‐altitude regions and reveal the variation patterns of alpine soil aggregates, using the Three Rivers Source of the Qinghai‐Tibetan Plateau as an example. Zonal soils representing the local vegetation types (alpine meadow soil, alpine grassland soil) were collected, and soil aggregates were separated using wet and dry sieving methods. Random forest modeling was used with climate data from 2011 to 2019 as variables in order to generate multifactor digital maps of water‐stable and mechanically stable aggregates. The composition and differences of zone‐specific soil aggregates were compared and analyzed using the evaluation indices of macroaggregate content (R > 0.25), mean weight diameter (MWD), geometric mean diameter, and fractal dimensions. Their controlling factors were also explored. The study results showed that the model's explanatory power for soil aggregates was over 68%. In the random forest model, elevation and sunshine duration contributed more to soil water‐stable aggregates, whereas precipitation contributed more to soil mechanically stable aggregates. The content of large aggregates with particle size greater than 0.5 mm was higher in alpine meadow soils than in alpine grassland soils. In contrast, the content of large aggregates with particle size less than 0.5 mm was lower than that of alpine grassland soils. There are also some differences in the distribution of water‐stable aggregates and mechanically stable aggregates between alpine meadow soils and alpine grassland soils in each particle size, and these differences are most pronounced in the particle sizes >2 and <0.25 mm. In addition, the stability of alpine meadow soil aggregates is higher than that of alpine grassland soil aggregates. Finally, the mapping results show that the stability of soil aggregates in the study area has similar zonal characteristics to the zonal variation of vegetation cover and climate and other factors.

show abstract

\begin{equation*}{R}^{\mathrm{2}}{\mathrm{\ = \ }}\mathop \sum \limits_{i{\mathrm{\ = \ 1}}}^N {\left\{ {p{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}{\mathrm{/}}\mathop \sum \limits_{{\mathrm{i}}\ {\mathrm{ = \ 1}}}^N {\left\{ {v{\mathrm{(}}{x}_i{\mathrm{)\ }} - \ \bar{p}} \right\}}^{\mathrm{2}}\end{equation*}

\begin{equation*}{\mathrm{CC\ = \ }}\frac{{{\mathrm{2}} \cdot R \cdot {{{\sigma}}}_v \cdot {{{\sigma}}}_p}}{{{{{\sigma}}}_p^{\mathrm{2}}{\mathrm{\ + \ }}{{{\sigma}}}_v^{\mathrm{2}}{\mathrm{\ + \ }}{{\left( {\bar{v}\ - \ \bar{p}} \right)}}^{\mathrm{2}}}}\end{equation*}

ME = \frac{1}{N} \sum_{i = 1}^{N} ({}, p false(x_{i}) goodbreak- v <...)

…”

Section: Methodsmentioning

confidence: 99%

Structure and stability characteristics of zonal soil aggregates in the Three Rivers Source of the Qinghai‐Tibetan Plateau

Sun

Ping-an

Zhang

et al. 2023

Soil Science Soc of Amer J

View full text Add to dashboard Cite

show abstract

“…[32][33][34][35] Random forests can be used to naturally rank the importance of variables in a regression or classification problem, and it is implemented in the R package randomForest or python library Scikit-learn (Sklearn). 36,37 Recently, Arashi et al 21 selected important variables with the use of random forests. The highdimensional data were reduced to low-dimensional data, and finally, the low-dimensional data were estimated using the Liu estimator.…”

Section: Proposed Methodologymentioning

confidence: 99%

“…where the response variable y i is a n Â 1 vector, the predictors x i ℝ nÂp , i ¼ 1, …, n, is a known n Â p matrix of predictors such that p > n, the regression coefficient β is a p Â 1 vector, and ϵ i , is the random error vector assumed to be normally distributed with mean 0 and variance σ 2 ℝ þ , i ¼ 1, …, n: In low dimensional settings when p < n, β is popularly estimated using the least squares estimator (OLS). OLS minimizes y À Xβ k k 2 2 subject to an L2 norm with respect to β but fails to give a unique estimate in high dimensional settings when p > n. 21 Another threat to the performance of OLS is multicollinearity, which surfaces as a result of the correlation or linear dependency among the predictors. [22][23][24][25][26][27] Biased estimators such as the ridge regression estimator, 28 the Liu estimator, 29 modified ridge-type estimator, 30 the Kibria-Lukman (KL) estimator, 31 robust principle component (PC)-ridge estimator, 24 JKL estimator, 22 and others were developed to account for multicollinearity problem in linear regression models.…”

Section: Theoretical Backgroundmentioning

confidence: 99%

“…However, recently they have been adopted as post estimation techniques with good performance. 21 The penalized estimation techniques are examples of variable selection methods that select important variables and shrink the irrelevant ones to zero. Much attention has been given to study different penalized estimators.…”

Section: Theoretical Backgroundmentioning

confidence: 99%

“…They claimed that the hybrid estimators retain the advantages of the combined methods and dampen their drawbacks. [18][19][20][21] Recently, Arashi et al 21 adopted the random forest regression for descriptor selection and employed the ridge or the Liu estimator for prediction. A recent study shows that the jackknife Kibria-Lukman (JKL) estimator outperforms the ridge or the Liu estimator when there is multicollinearity.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Two‐step hybrid modeling for variable selection and estimation: An application to quantitative structure activity relationship study

Oranye,

Ugwuowo,

Arum

2023

Journal of Chemometrics

View full text Add to dashboard Cite

In this study, we developed a simple technique for effective parameter estimation and prediction of the quantitative structure activity relationship studies using a two‐step procedure. The first step is to choose the important molecular descriptors using the random forest regression, and the second step is to optimally predict the biological activity of the selected chemical compounds using the following estimators: ridge regression, jackknife ridge, Liu regression, jackknife Liu, Kibria–Lukman, and jackknife Kibria–Lukman. We conducted a simulation study and a real‐life analysis with a quantitative structure–activity relationship (QSAR) data with 2540 descriptors after preprocessing. The optimal prediction is determined using the cross‐validation error. The estimator with minimum cross‐validation error is considered best. It is obvious that performing jackknife estimation after random forest selection is preferred.

show abstract

Hybrid Surrogate Assisted Evolutionary Multiobjective Reinforcement Learning for Continuous Robot Control

Mazumdar,

Kyrki

2024

Applications of Evolutionary Computation

View full text Add to dashboard Cite

Liu regression after random forest for prediction and modeling in high dimension

Cited by 5 publications

References 39 publications

Structure and stability characteristics of zonal soil aggregates in the Three Rivers Source of the Qinghai‐Tibetan Plateau

Structure and stability characteristics of zonal soil aggregates in the Three Rivers Source of the Qinghai‐Tibetan Plateau

Two‐step hybrid modeling for variable selection and estimation: An application to quantitative structure activity relationship study

Hybrid Surrogate Assisted Evolutionary Multiobjective Reinforcement Learning for Continuous Robot Control

Contact Info

Product

Resources

About