Background
High temperatures in urban areas owing to climate change and urban heat islands have led to an increase in the number of heatstroke patients. To prevent heatstroke, accurate heatstroke patient prediction model should be used to predict and alert people to their risk. However, most previous models have not tested sufficient training data, although susceptibility to heatstroke is likely to be dependent on year-wise trends and is susceptible to training data. We investigated an accurate heatstroke risk model that is robust to the training data. By examining the factors affecting the accuracy and trade-off between the quantity and quality of adding old data to training data, the method of selecting training data for constructing an accurate model was also investigated.
Method
We compared the accuracies of three methods: multiple regression analysis (MR), generalized additive model (GAM), and time-stratified case-crossover analysis (TC). The training data for all combinations from 2012 were tested. By comparing the errors of each method, we identified the error influencing factors in the training data.
Results
The TC errors were the smallest (p < 0.005) and much less sensitive to the training data than others. Best accuracy odds ratios were 1.41–1.44. The error was significantly larger when the number of extremely hot days differed between the training and test data in MR and GAM (p < 0.01, p < 0.05). All three methods tended to increase accuracy up to a certain point and decrease from the middle of the year when adding past years retroactively from the most recent year in training data.
Conclusions
By using the odds ratios produced by TC with low sensitivity to training data, we can develop highly accurate heatstroke risk model that are robust to training data, which has not been possible with previous models. If data are to be included continuously, a model constructed from three or four years of data from the latest one is the most accurate.