Abstract. Water quality monitoring is important in maintaining the cleanliness and health of water bodies. It enables us to identify sources of pollutions and study trends. While modern methods include the use of satellite images to estimate water quality parameters, commonly used satellite systems, such as Landsat and Sentinel, only generate images with temporal resolution of 2 to 16 days on the average. Himawari-8 satellite system, on the other hand, generates full-disk images every 10-minutes, making it possible to generate water quality parameters concentration maps more frequently. This paper presents the preliminary analysis of the generation of yearly and seasonal Chlorophyll-a (Chl-a) and Total Suspended Matter (TSM) estimation models using Himawari-8 satellite images and linear regression. Correlation analysis shows that the single spectral bands and band ratios involving Red band have the strongest relationship with Chl-a and TSM. Generated linear regression yearly and seasonal models resulted to R2 values of 0.4 to 0.5 with RMSE values around 3 micrograms/cm3 for Chl-a and 9.5 grams/m3 for TSM. Results also indicate that the seasonal models are better than the yearly models in terms of fit and error. Results from the preliminary investigation will be used to generate a more robust global model in future studies.
Abstract. Recent studies have investigated the use of satellite imaging combined with machine learning for modelling the Chlorophyll-a (Chl-a) concentration of bodies of water. However, most of these studies use satellite data that lack the temporal resolution needed to monitor dynamic changes in Chl-a in productive lakes like Laguna Lake. Thus, the aim of this paper is to present the methodology for modelling the Chl-a concentration of Laguna Lake in the Philippines using satellite imaging and machine learning algorithms. The methodology uses images from the Himawari-8 satellite, which have a spatial resolution of 0.5–2 km and are taken every 10 minutes. These are converted into a GeoTIFF format, where differences in spatial resolution are resolved. Additionally, radiometric correction, resampling, and filtering of the Himawari-8 bands to exclude cloud-contaminated pixels are performed. Subsequently, various regression and gradient boosting machine learning algorithms are applied onto the train dataset and evaluated, namely: Simple Linear Regression, Ridge Regression, Lasso Regression, and Light Gradient Boosting Model (LightGBM). The results of this study show that it is indeed possible to integrate algorithms in Machine Learning in modelling the near real-time variations in Chl-a content in a body of water, specifically in the case of Laguna Lake, to an acceptable margin of error. Specifically, the regression models performed similarly with a train RMSE of 1.44 and test RMSE of 2.51 for Simple Linear Regression and 2.48 for Ridge and Lasso Regression. The linear regression models exhibited a larger degree of overfitting than the LightGBM model, which had a 2.18 train RMSE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.