Variable Importance Analysis in Imbalanced Datasets: A New Approach

Dfuf, Ismael Ahrazem; Perez-Minayo, Joaquin Forte; McWilliams, José Manuel Mira; Fernández, Camino

doi:10.1109/access.2020.3008416

Cited by 8 publications

(4 citation statements)

References 87 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RF is a nonparametric ML approach for multiclass classification and regression problems [95]. The RF is a wellrecognized and hugely successful bagging type of decision tree ensemble.…”

Section: Random Forestsmentioning

confidence: 99%

A Machine Learning-Based Surrogate Finite Element Model for Estimating Dynamic Response of Mechanical Systems

2023

View full text Add to dashboard Cite

An efficient approach for improving the predictive understanding of dynamic mechanical system variability is developed in this work. The approach requires low model assessment time through the fitting of surrogate models. ML-based surrogate algorithms for finite element analysis (FEA) are developed in this study to accelerate FEA and prevent rerunning complex simulations. The research begins with an overview of the recent novelties in ML algorithms applied to finite element (FE) and other physics-based computational schemes. To predict the time-varying response variables, that is, the displacement of a twodimensional truss structure, a surrogate FE technique based on ML algorithms is developed. In this work, several ML regression algorithms, including decision trees (DTs) and deep neural networks, are developed, and their efficacies are compared. In this study, the ML-based surrogate FE models are able to effectively predict the response of the truss structure in two dimensions over the entire structure. Extreme gradientboosting DTs provide more precise outcomes and outperform other ML algorithms.INDEX TERMS Surrogate modeling, finite element analysis, mechanical system analysis, machine learning, artificial neural networks, random forest trees, gradient boosting regression trees, adaptive boosting trees.

show abstract

“…RF is a nonparametric ML approach for multiclass classification and regression problems [95]. The RF is a wellrecognized and hugely successful bagging type of decision tree ensemble.…”

Section: Random Forestsmentioning

confidence: 99%

A Machine Learning-Based Surrogate Finite Element Model for Estimating Dynamic Response of Mechanical Systems

2023

View full text Add to dashboard Cite

show abstract

“…One of the essential outputs of a machine learning algorithm is variable importance (VI) [1]- [2]. This VI indicates the importance of the predictor variables.…”

Section: Introductionmentioning

confidence: 99%

An optimal variable importance for machine learning classification models using modified simulated annealing algorithm

Rusyana,

Wigena,

Sumertajaya

et al. 2024

IOP Conf. Ser.: Earth Environ. Sci.

View full text Add to dashboard Cite

Each machine learning model will generate a different importance variable even though the method used is the same. Interpreting the variable significance is confusing. This study proposes combining several variable importance measures using a simulated annealing algorithm with an initial solution of mean and mode. The study uses simulation and empirical data. The simulation data are divided into three scenarios: no correlation, moderate correlation, and high correlation among predictor variables. The empirical data consist of 24 predictor variables. The machine learning models are classification models of random forest, extreme gradient boosting, neural network, and support vector machine. Based on the simulation data study, the combined variable importance will be optimal when predictor variables have low correlation. The simulated annealing algorithms show convergent objective values around the 25th iteration in empirical data. The more predictor variables, the higher the accuracy of this variable importance. Accuracy is optimal when the number of predictors exceeds ten. The five most important variables in explaining family food insecurity are the education of the family head, the floor type of the house, the number of family members who have a savings account, ownership of land, and decent drinking water.

show abstract

“…As the machine learning accuracy highly dependant upon this stage, for imbalance datasets, accuracy performance becomes a challenging task. Variable importance measurement techniques described in [10], highlights usage of imbalance dataset and shows out performance of proposed method. To achieve high dimensional selection consistency in decision tree algorithm, researchers have presented model selection algorithm named DSTUMP that outperforms in nonlinear additive model settings [10].…”

Section: Introductionmentioning

confidence: 99%

“…Importance combined with prior knowledge parameters to select features, when applied to soft measuring model, these features have shown increase in performance, as stated in [18]. Similarly permutation based framework, a dissimilarities based algorithm is proposed by [10] researchers that computes variable importance using distribution of misclassification errors. In the area of image classification, researchers have proposed method of quantifying of variable importance, employing concept of game theory and metric of Shapely value which is applicable to any type of model [19].…”

Section: Introductionmentioning

confidence: 99%

Analysis of Variable Importance Measurement Techniques for Classification of Road Surfaces

Jawale¹,

Magar²

2023

Advances in Intelligent Systems Research

View full text Add to dashboard Cite

The term variable importance refers to the role of an attribute in making accurate predictions. A particular model, when relies majorly on multiple variables, increases variable importance of those variables in positive direction. Variable importance is applied to various classification and regression models using different methods. For example, in regression model, higher value Root Mean Squared Error (RMSE) is the indicator of high importance to that variable, whereas in classification model, higher number of splits associated with a variable determines its importance in the model. In this research study, we have considered a problem of road surface classification depending upon 17 variables associated with vehicle parameters. This is a multiclass classification problem. Different classification and regression models are used, and variable importance of each model is evaluated on the metrics like RMSE, Goodness of fit model. Outcome of this research study shows all models define a common set of 5 to 7 higher importance variable rankings to predict dependant variable.

show abstract

Variable Importance Analysis in Imbalanced Datasets: A New Approach

Cited by 8 publications

References 87 publications

A Machine Learning-Based Surrogate Finite Element Model for Estimating Dynamic Response of Mechanical Systems

A Machine Learning-Based Surrogate Finite Element Model for Estimating Dynamic Response of Mechanical Systems

An optimal variable importance for machine learning classification models using modified simulated annealing algorithm

Analysis of Variable Importance Measurement Techniques for Classification of Road Surfaces

Contact Info

Product

Resources

About