2020
DOI: 10.1002/isaf.1483
|View full text |Cite
|
Sign up to set email alerts
|

Modelling unbalanced catastrophic health expenditure data by using machine‐learning methods

Abstract: Summary This study aims to compare the performances of logistic regression and random forest classifiers in a balanced oversampling procedure for the prediction of households that will face catastrophic out‐of‐pocket (OOP) health expenditure. Data were derived from the nationally representative household budget survey collected by the Turkish Statistical Institute for the year 2012. A total of 9,987 households returned valid surveys. The data set was highly imbalanced, and the percentage of households facing c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 50 publications
(93 reference statements)
0
3
0
Order By: Relevance
“…Basic medical insurance does not protect households from catastrophic out-of-pocket (OOP) health expenditures. To reduce inequality, it would be beneficial to utilize big data tools and techniques to effectively screen poor households and strengthen the social and medical aid system for them [ 54 57 ].…”
Section: Resultsmentioning
confidence: 99%
“…Basic medical insurance does not protect households from catastrophic out-of-pocket (OOP) health expenditures. To reduce inequality, it would be beneficial to utilize big data tools and techniques to effectively screen poor households and strengthen the social and medical aid system for them [ 54 57 ].…”
Section: Resultsmentioning
confidence: 99%
“…If the prediction model has achieved an accuracy of 99%, the test results can be considered accurate regardless of whether they are positive or negative. This behavior can cause a problem with highly imbalanced datasets, where the model's accuracy would be almost 100% even without the ability to identify one positive case (Cinaroglu, 2020).…”
Section: Methodsmentioning
confidence: 99%
“…The explanatory variables were selected from an extensive review of the literature, essentially comprising sociodemographic characteristics [8,12,18,21,25,[52][53][54][55][56][57]. The sociodemographic characteristics are: gender (male, female); age; marital status (married, single, widowed, separated/divorced); educational level (very low: illiterate/primary school incomplete, low: primary or equivalent, medium: secondary school/vocational training, high: university degree or equivalent); activity status (receiving earnings-related pension, employed, unemployed, other situations [housewife, student, etc.…”
Section: Predictor Variablesmentioning
confidence: 99%