Speaker
Description
Air pollution is a leading cause of global premature mortality, and is linked to more than 1 million premature deaths in Africa per year. Advancements in low-cost sensors (LCS) are helping bridge the data gap left by historically expensive and technically challenging reference-grade monitors. While novel data science techniques are being used to develop correction factors for LCS, these studies generally (1) use co-locations with expensive reference-grade monitors, (2) utilize temperature, humidity and other measurements to account for variation in hygroscopicity and optical properties, and (3) are often local in scope, limited to one city or metro area.
Can we use correction factors developed at one location, in another? We use co-locations from 5 cities (Palisades, NY; Accra, Ghana; Lomé, Togo; Kinshasa, DRC; Kolkata, India) at varying climatologies and distances to assess the performance of Multiple Linear Regression, Random Forest and Gaussian Mixture Regression correction factors, and compare them to published correction factors in the literature.
Additionally, we develop a Global Gaussian Mixture Regression (GMR) machine learning model trained on co-locations from 15+ cities in the Clean Air Monitoring and Solutions Network (CAMS-Net). GMR has proven successful for correcting LCS data: in Kinshasa, the GMR-corrected Purple Air data resulted in R2 = 0.88 when compared to the MetOne BAM-1020, and in Accra, the GMR lowered the Mean Absolute Error of Clarity data from 7.51 𝜇g/m3 to 1.93 𝜇g/m3. The wide breadth of the Global GMR allows for correction of LCS data without the need for a local co-location; we present an open-source dashboard that enables the correction of data from 20,000+ PurpleAir and Clarity sensors around the world.