This paper addresses the regression modeling of local environmental pollution levels for electric power industry needs, which is fundamental for the proper design and maintenance of high-voltage transmission lines and insulators in order to prevent various hazards, such as accidental flashovers due to pollution and the resultant power outages. The primary goal of our study was to increase the precision of regression models for this application area by exploiting additional input attributes extracted from satellite imagery and adjusting the modeling methodology. Given that thousands of different attributes can be extracted from satellite images, of which only a few are likely to contain useful information, we also explored suitable feature selection procedures. We show that a suitable combination of attribute selection methods (relief, FSRF-Test, and forward selection), regression models (random forest models and M5P regression trees), and modeling methodology (estimating field-measured values of target variables rather than their upper bounds) can significantly increase the total modeling accuracy, measured by the correlation between the estimated and the true values of target variables. Specifically, the accuracies of our regression models dramatically rose from 0.12–0.23 to 0.40–0.64, while their relative absolute errors were conversely reduced (e.g., from 1.04 to 0.764 for the best model).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.