2020
DOI: 10.3390/min10050420
|View full text |Cite
|
Sign up to set email alerts
|

Process Variable Importance Analysis by Use of Random Forests in a Shapley Regression Framework

Abstract: Linear regression is often used as a diagnostic tool to understand the relative contributions of operational variables to some key performance indicator or response variable. However, owing to the nature of plant operations, predictor variables tend to be correlated, often highly so, and this can lead to significant complications in assessing the importance of these variables. Shapley regression is seen as the only axiomatic approach to deal with this problem but has almost exclusively been used with linear mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 42 publications
(15 citation statements)
references
References 32 publications
0
8
0
1
Order By: Relevance
“…Furthermore, the Naive Bayes and Random Forest classifiers were selected to measure the importance of the features. Random Forest creates a forest of trees, and per tree measures a candidate feature’s ability to optimally split the instances into two classes using the Gini impurity [ 55 ]. Naive Bayes calculates the probability of each feature in order to evaluate their performance at predicting the output variable.…”
Section: Materials and Methodsmentioning
confidence: 99%
“…Furthermore, the Naive Bayes and Random Forest classifiers were selected to measure the importance of the features. Random Forest creates a forest of trees, and per tree measures a candidate feature’s ability to optimally split the instances into two classes using the Gini impurity [ 55 ]. Naive Bayes calculates the probability of each feature in order to evaluate their performance at predicting the output variable.…”
Section: Materials and Methodsmentioning
confidence: 99%
“…Tüm veriler 𝑁 değişkeni ile temsil edilir ve seçilmiş veri ise 𝑛 ile temsil edilir. Ayrıca, 𝑝 𝑖 değişkeni seçilmiş verinin kendisinden küçük ve kendisinden büyük eleman sayısına bölümünün karesini temsil eder [16].…”
Section: Rastgele Orman Yöntemiunclassified
“…However, the Gini importance has a drawback. It is known to be biased towards input variables with continuous and discrete variable with high cardinality (Zhou and Hooker 2021 ; Aldrich 2020 ; Gómez-Ramírez et al 2020 ), as these variables provide high possibilities for tree splitting. To address this issue, Lundberg and Lee ( 2017 ) propose a method that is based on Shapley values (Hur et al 2017 ; Aldrich 2020 ).…”
Section: Post-modeling Analysismentioning
confidence: 99%
“…It is known to be biased towards input variables with continuous and discrete variable with high cardinality (Zhou and Hooker 2021 ; Aldrich 2020 ; Gómez-Ramírez et al 2020 ), as these variables provide high possibilities for tree splitting. To address this issue, Lundberg and Lee ( 2017 ) propose a method that is based on Shapley values (Hur et al 2017 ; Aldrich 2020 ). Stemming from game theory, Shapley values provide a theoretically justified way to fairly allocate a coalition’s output among members in the coalition (Shapley 1953 ).…”
Section: Post-modeling Analysismentioning
confidence: 99%
See 1 more Smart Citation