A Variable Impacts Measurement in Random Forest for Mobile Cloud Computing

Hur, Jaehee; Ihm, Sun-Young

doi:10.1155/2017/6817627

Cited by 45 publications

(23 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Variable importance was verified based on the calculated mean decrease in accuracy [ 16 ] (in the case of RF) and Olden’s method output [ 17 ] (in the case of ANN). The data was split randomly into two groups: one half of the data (50%) comprised the training set (used for model learning), while the other half was used for testing (the previously developed model was used to make predictions on this new data; the observed rates of false positive, false negative, true positive, and true negative were used to evaluate performance by class).…”

Section: Methodsmentioning

confidence: 99%

Lipidomics as a Diagnostic Tool for Prostate Cancer

Buszewska-Forajta

Pomastowski

Monedeiro

et al. 2021

Cancers

View full text Add to dashboard Cite

The main goal of this study was to explore the phospholipid alterations associated with the development of prostate cancer (PCa) using two imaging methods: matrix-assisted laser desorption ionization with time-of-flight mass spectrometer (MALDI-TOF/MS), and electrospray ionization with triple quadrupole mass spectrometer (ESI-QqQ/MS). For this purpose, samples of PCa tissue (n = 40) were evaluated in comparison to the controls (n = 40). As a result, few classes of compounds, namely phosphatidylcholines (PCs), lysophosphatidylcholines (LPCs), sphingomyelins (SMs), and phosphatidylethanolamines (PEs), were determined. The obtained results were evaluated by univariate (Mann–Whitney U-test) and multivariate statistical analysis (principal component analysis, correlation analysis, volcano plot, artificial neural network, and random forest algorithm), in order to select the most discriminative features and to search for the relationships between the responses of these groups of substances, also in terms of the used analytical technique. Based on previous literature and our results, it can be assumed that PCa is linked with both the synthesis of fatty acids and lipid oxidation. Among the compounds, phospholipids, namely PC 16:0/16:1, PC 16:0/18:2, PC 18:0/22:5, PC 18:1/18:2, PC 18:1/20:0, PC 18:1/20:4, and SM d18:1/24:0, were assigned as metabolites with the best discriminative power for the tested groups. Based on the results, lipidomics can be found as alternative diagnostic tool for CaP diagnosis.

show abstract

Section: Methodsmentioning

confidence: 99%

Lipidomics as a Diagnostic Tool for Prostate Cancer

Buszewska-Forajta

Pomastowski

Monedeiro

et al. 2021

Cancers

View full text Add to dashboard Cite

show abstract

“…While we have attempted to reduce training time and the potential for overfitting with careful feature selection methods, random forest modeling has inherent limitations, which include high model complexity requiring computational resources and longer training periods than other machine learning frameworks. We use Mean Decrease Accuracy for feature selection, which has been known to have limitations due to the multicollinearity problem (variable impact calculation is less accurate when there are high numbers of correlated variables) (Hur et al, 2017 ). Future studies with significantly larger sample sizes will be required to improve upon this framework for general student athlete injury risk.…”

Section: Discussionmentioning

confidence: 99%

“…In future studies, gradient tree boosting, another tree-based machine learning model, could be considered to reduce the computational resources necessary for random forest modeling. Additionally, new methods of feature selection that solve the multicollinearity problem, such as the Shapley Value method (Hur et al, 2017 ), could be considered for future selection in future studies.…”

Section: Discussionmentioning

confidence: 99%

Machine Learning to Predict Lower Extremity Musculoskeletal Injury Risk in Student Athletes

Henriquez

Sumner

Faherty

et al. 2020

Front. Sports Act. Living

View full text Add to dashboard Cite

Injury rates in student athletes are high and often unpredictable. Injury risk factors are not agreed upon and often not validated. Here, we present a random-forest machine learning methodology for identifying the most significant injury risk factors and develop a model of lower extremity musculoskeletal injury risk in student athletes with physical performance metrics spanning joint strength measured with force transducers, postural stability measured using a force plate, and flexibility, measured with a goniometer, combined with previous injury metrics and athlete demographics. We tested our model in a population of 122 student athletes with performance metrics for the lower extremity musculoskeletal system and achieved an injury risk accuracy of 79% and identified significant injury risk factors, that could be used to increase accuracy of injury risk assessments, implement timely interventions, and decrease the number of career-ending or chronic injuries among student athletes.

show abstract

“…By doing so, the importance of a variable in predicting the response is quantified by evaluating the difference of how much including or excluding that variable decreases or increases accuracy [18][19][20]. This difference is referred to as the Mean Decrease Accuracy (MDA), and is computed by the formula shown in Equation 3 [21,22].…”

Section: Unsupervised Variable Selectionmentioning

confidence: 99%

Filter Variable Selection Algorithm Using Risk Ratios for Dimensionality Reduction of Healthcare Data for Classification

Bodur

Atsa’am

2019

Processes

View full text Add to dashboard Cite

This research developed and tested a filter algorithm that serves to reduce the feature space in healthcare datasets. The algorithm binarizes the dataset, and then separately evaluates the risk ratio of each predictor with the response, and outputs ratios that represent the association between a predictor and the class attribute. The value of the association translates to the importance rank of the corresponding predictor in determining the outcome. Using Random Forest and Logistic regression classification, the performance of the developed algorithm was compared against the regsubsets and varImp functions, which are unsupervised methods of variable selection. Equally, the proposed algorithm was compared with the supervised Fisher score and Pearson’s correlation feature selection methods. Different datasets were used for the experiment, and, in the majority of the cases, the predictors selected by the new algorithm outperformed those selected by the existing algorithms. The proposed filter algorithm is therefore a reliable alternative for variable ranking in data mining classification tasks with a dichotomous response.

show abstract

A Variable Impacts Measurement in Random Forest for Mobile Cloud Computing

Cited by 45 publications

References 14 publications

Lipidomics as a Diagnostic Tool for Prostate Cancer

Lipidomics as a Diagnostic Tool for Prostate Cancer

Machine Learning to Predict Lower Extremity Musculoskeletal Injury Risk in Student Athletes

Filter Variable Selection Algorithm Using Risk Ratios for Dimensionality Reduction of Healthcare Data for Classification

Contact Info

Product

Resources

About