Low-cost sensors
(LCSs) for air quality monitoring have enormous
potential to improve air quality data coverage in resource-limited
parts of the world such as sub-Saharan Africa. LCSs, however, are
affected by environment and source conditions. To establish high-quality
data, LCSs must be collocated and calibrated with reference grade
PM2.5 monitors. From March 2020, a low-cost PurpleAir PM2.5 monitor was collocated with a Met One Beta Attenuation
Monitor 1020 in Accra, Ghana. While previous studies have shown that
multiple linear regression (MLR) and random forest regression (RF)
can improve accuracy and correlation between PurpleAir and reference
data, MLR and RF yielded suboptimal improvement in the Accra collocation
(R
2 = 0.81 and R
2 = 0.81, respectively). We present the first application of
Gaussian mixture regression (GMR) to air quality data calibration
and demonstrate improvement over traditional methods by increasing
the collocated PM2.5 correlation and accuracy to R
2 = 0.88 and MAE = 2.2 μg m–3. Gaussian mixture models (GMMs) are a probability density estimator
and clustering method from which nonlinear regressions that tolerate
missing inputs can be derived. We find that even when given missing
inputs, GMR provides better correlation than MLR and RF performed
with complete data. GMR also allows us to estimate calibration certainty.
When evaluated, 95% confidence intervals agreed with reference PM2.5 data 96% of the time, suggesting that the model accurately
assesses its own confidence. Additionally, clustering within the GMM
is consistent with climate characteristics, providing confidence that
the calibration approach can learn underlying relationships in data.