New Machine Learning Method Enhances Global Air Quality Station Classification for TOAR

Dear TOAR-II community,

our dear colleague Ramiyou Karim Mache and Colleagues from the Forschungszentrum Jülich, has a preprint of his work to announce.

Here is a short summary of his paper:

TOAR-classifier v2: A data-driven classification tool for global air quality stations

A new study has developed a machine learning-based approach to more accurately classify air quality monitoring stations in the Tropospheric Ozone Assessment Report (TOAR) database, offering a major advancement for global air quality research. The classification of stations—into urban, suburban, or rural types—is crucial for meaningful and reliable analysis of air pollution data.

The researchers applied both unsupervised and supervised learning techniques to 23,974 TOAR stations worldwide. While traditional K-means clustering performed moderately well for urban (70.03%) and rural (71.53%) sites, it struggled with suburban classification (26.36%). Supervised models, including random forest, CatBoost, and LightGBM, showed significantly better accuracy, especially when combined into a voting ensemble model. These advanced methods achieved over 84% accuracy for urban and rural sites, and 62–65% for suburban locations.

To further improve classification performance, the team introduced a probability-based threshold adjustment method that notably enhanced results for the ambiguous suburban category. Validation using independent NOx and PM2.5 air pollutant measurements confirmed the reliability of the new classifications. Additionally, a manual review of 25 selected stations via Google Maps indicated that the model’s predictions were more accurate than the original labels provided by data contributors.

This objective and data-driven classification system lays a stronger foundation for consistent, type-specific air quality analysis within TOAR and could serve as a model for similar global datasets.

To the publication:

https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1399/

 

Best regards,

The TOAR database team at the Jülich Supercomputing Center

 

Image: results by Ramiyou Karim Mache

Some other news from TOAR Data Portal:

DataUpdates

JOIN is deactivated

Dear JOIN user, on February 01, 2024 our Join interface was finally deactivated.From now on, only the new TOAR II Dashboard can be used. Your

Read More »