1999
DOI: 10.3386/t0246
|View full text |Cite
|
Sign up to set email alerts
|

Estimating Log Models: To Transform or Not to Transform?

Abstract: Data on health care expenditures, length of stay, utilization of health services, consumption of unhealthy commodities, etc. are typically characterized by: (a) nonnegative outcomes; (b) nontrivial fractions of zero outcomes in the population (and sample); and (c) positively-skewed distributions of the nonzero realizations. Similar data structures are encountered in labor economics as well. This paper provides simulation-based evidence on the finite-sample behavior of two sets of estimators designed to look at… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
550
1
3

Year Published

2005
2005
2018
2018

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 434 publications
(558 citation statements)
references
References 0 publications
4
550
1
3
Order By: Relevance
“…Opioid prescriptions, health care utilization, and health/quality of care outcomes were estimated using a probit model to predict the probability of an opioid prescription, advanced imaging (i.e., MRI or CT scan), radiography, ED visit, hospitalization, any surgery, or any serious diagnosis such as cancer and nonmusculoskeletal reasons for back pain (see Table ). Costs were estimated using a generalized linear model (GLM) assuming a gamma distribution and log link to control for skewness of the data in estimating health care costs (Manning and Mullahy ). Where costs had a large proportion of zero values, costs were estimated using a two‐part model where the first part estimates the probability of having any health care costs, and the second part estimates the expected health care costs among non‐negative cost values (Belotti et al.…”
Section: Methodsmentioning
confidence: 99%
“…Opioid prescriptions, health care utilization, and health/quality of care outcomes were estimated using a probit model to predict the probability of an opioid prescription, advanced imaging (i.e., MRI or CT scan), radiography, ED visit, hospitalization, any surgery, or any serious diagnosis such as cancer and nonmusculoskeletal reasons for back pain (see Table ). Costs were estimated using a generalized linear model (GLM) assuming a gamma distribution and log link to control for skewness of the data in estimating health care costs (Manning and Mullahy ). Where costs had a large proportion of zero values, costs were estimated using a two‐part model where the first part estimates the probability of having any health care costs, and the second part estimates the expected health care costs among non‐negative cost values (Belotti et al.…”
Section: Methodsmentioning
confidence: 99%
“…Table 6 demonstrates that the same pattern observed with costs was seen with significantly higher medical service utilization (emergency department admissions, inpatient admissions, and physician office visits) for patients using FGA medications relative to those using SGA medications ( P < 0.01), using ordinal logit link function regression after adjusting for the same demographic and comorbidity covariates shown in Table 4. As a further sensitivity analysis check, we also re‐estimated the cost regressions as two‐part models using log‐transformed costs in the second equations and found predicted cost differences similar to those reported in Tables 4 and 5[35].…”
Section: Resultsmentioning
confidence: 99%
“…However, because the dependent variable was transformed to be on a different scale, it was necessary to re‐transform the associated expected value, marginal effects and elasticity estimates to follow the original scale of the data. This re‐transformation process could have introduced bias, the magnitude of which would be proportional to the correlation between the variance of the error term and the explanatory variables (Manning and Mullahy, ).…”
Section: Empirical Model Specification and Estimation Strategymentioning
confidence: 99%
“…We thus checked the correlation between the model error terms and the explanatory variables, finding it to be very weak for most variables; the highest correlation coefficient was for education (0.24). Nevertheless, to test the sensitivity of the results to this potential bias, we also estimated standard and two‐part generalised linear models (GLMs) (Manning and Mullahy, ). Because GLMs model the expected value directly and do not require a re‐transformation of the estimates, they are claimed to be free from the bias arising from re‐transformation (Manning and Mullahy, ).…”
Section: Empirical Model Specification and Estimation Strategymentioning
confidence: 99%
See 1 more Smart Citation