Traditional water quality observations are achieved via field sampling and laboratory chemical analysis, which involve high labor, financial and time costs (Das & Jain, 2017;Zulkifli et al., 2018). The data limitation is further exacerbated by policy, culture and technical barriers to free data sharing (Li et al., 2021).Recently, in situ water quality monitoring techniques based on various sensors have rapidly developed (Kruse, 2018;Singh et al., 2022). Sensor-based in situ monitoring avoids tedious sampling procedures and complex analytical processes and produces continuous and high-frequency observations. This approach has been increasingly applied worldwide and has produced a large amount of water quality data (Meyer et al., 2019;Park et al., 2020), providing an opportunity to adopt environmental big data to improve WWQ modeling. However, compared to traditional data, long-term sensing in complex natural environments is subject to more significant measurement errors stemming from biofouling, lack of equipment calibration, background ion interference and other factors (Mahmud et al., 2020;Makarov et al., 2021). Typical error types in sensor data include outliers, noise, constant values, missing data, bias due to incorrect calibration, sensor drift from the calibration level and other errors (Horsburgh et al., 2015;Teh et al., 2020). In addition, as most existing sensors are installed to