Artificial intelligence

Scientific journal

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Select your language


Air pollution prediction using machine learning

Olenych I.1, Demchyk D.1, Babiak S.1, Futey O.1
1 Ivan Franko Lviv National University
igor.olenych@lnu.edu.ua; d.demchik15@gmail.com; frageks@gmail.com; oleksandr.futey@lnu.edu.ua

Full text (PDF)

UDC: 004.89
Publication Language: English
Stuc. intelekt. 2025; 30(1):141-146

Abstract: Prediction of air pollution with particulate matter is a critically important task for developing effective strategies to improve the environmental situation. Despite the large number of predictive machine learning models, insufficient attention has been paid to investigating the effectiveness of pollution prediction in different ranges of microparticle concentrations. The paper proposes models for forecasting atmospheric pollution with particulate matter up to 2.5 microns in size (PM2.5) based on the Long Short-Term Memory (LSTM), Extreme Gradient Boosting (XGBoost), and Random Forest algorithms taking into account meteorological and spatio-temporal data obtained by the developed air quality monitoring system. Particular attention was focused on studying the dependence of forecasting accuracy on the level of atmospheric pollution. It was found that the proposed models successfully predict the PM2.5 content in the air at low and medium levels of pollution but underestimate the predicted values as their concentration increases. Based on the analysis of the concentration dependences of absolute and relative errors, it was found that the Random Forest method demonstrates the highest prediction accuracy in a wide range of the PM2.5 concentration with a relative error of 6–9 % despite deviations for some peak values. Models based on the XGBoost and LSTM methods are characterized by errors of 9–11 and 11–14 %, respectively. A decrease in forecast accuracy and a significant increase in the variance of predicted values were found with an increase in the concentration of the particulate matter in the air. The LSTM method demonstrates the worst results for high levels of air pollution. The decrease in the effectiveness of predictive models with increasing atmospheric pollution may be due to the small number of records with a high concentration of particulate matter in the dataset and the random appearance of additional pollution sources unrelated to meteorological conditions and spatio-temporal characteristics. An integral assessment of the accuracy of the developed models using the metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), and the coefficient of determination R² confirms the high efficiency of predicting the PM2.5 concentration in the air.

Keywords: air pollution, forecasting, model efficiency, machine learning, artificial neural networks.

References:

  1. Air quality in the world [Electronic resource]. - Mode of access: https://www.iqair.com/world-air-quality
  2. Xing, Y. F., Xu, Y. H., Shi, M. H., & Lian, Y. X. (2016). The impact of PM2.5 on the human respiratory system. Journal of Thoracic Disease, 8(1), E69–74. https://doi.org/10.3978/j.issn.2072-1439.2016.01.19
  3. Vargas, J. E., Kubesch, N., Hernandéz-Ferrer, C., Carrasco-Turigas, G., Bustamante, M., Nieuwenhuijsen, M., & González, J. R. (2018). A systemic approach to identify signaling pathways activated during short-term exposure to traffic-related urban air pollution from human blood. Environ. Sci. Pollut. Res., 25(29), 29572–29583. https://doi.org/10.1007/s11356-018-3009-8
  4. Kan, H., Chen, R., & Tong, S. (2012). Ambient air pollution, climate change, and population health in China. Environment International, 42, 10–19. https://doi.org/10.1016/j.envint.2011.03.003
  5. Kunzli, N., Jerrett, M., Mack, W. J., Beckerman, B., LaBree, L., Gilliland, F., Thomas, D., Peters, J., & Hodis, H. N. (2005). Ambient air pollution and atherosclerosis in Los Angeles. Environ. Health Perspect., 113, 201–206. https://doi.org/10.1289/ehp.7523
  6. Cieplak, T., Rymarczyk, T., & Tomaszewsk, R. (2019). A concept of the air quality monitoring system in the city of Lublin with machine learning methods to detect data outliers. MATEC Web of Conferences, 252, 03009. https://doi.org/10.1051/matecconf/201925203009
  7. O'Leary, B., Reiners, J. J. Jr., Xu, X., & Lemke, L. D. (2016). Identification and influence of spatio-temporal outliers in urban air quality measurements. Science of the Total Environment, 573, 55–65. https://doi.org/10.1016/j.scitotenv.2016.08.031
  8. Rukmani, P., Teja, G. K., & Vinay, M. S. (2018). Industrial monitoring using image processing, IoT and analyzing the sensor values using big data. Procedia Computer Science, 133, 991–997. https://doi.org/10.1016/j.procs.2018.07.077
  9. Shankar, L., & Arasu, K. (2023). Deep learning techniques for air quality prediction: a focus on PM2.5 and periodicity. Migration Letters, 20(S13), 468–484. https://doi.org/10.59670/ml.v20iS13.6477
  10. Kalajdjieski, J., Trivodaliev, K., Mirceva, G., Kalajdziski, S., & Gievska, S. (2023). A complete air pollution monitoring and prediction framework. IEEE Access, 11, 88730–88744. https://doi.org/10.1109/ACCESS.2023.3251346
  11. Mokhtari, I., Bechkit, W., Rivano, H., & Yaici, M. R. (2021). Uncertainty-aware deep learning architectures for highly dynamic air quality prediction. IEEE Access, 9, 14765–14778. https://doi.org/10.1109/ACCESS.2021.3052429
  12. Doreswamy, N., Harishkumar, K. S., Yogesh, K. M., & Gad, I. (2020). Forecasting air pollution particulate matter (PM2.5) using machine learning regression models. Procedia Computer Science, 171, 2057–2066. https://doi.org/10.1016/j.procs.2020.04.221
  13. Qi, Y., Li, Q., Karimian, H., & Liu, D. (2019). A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. Science of the Total Environment, 664, 1–10. https://doi.org/10.1016/j.scitotenv.2019.01.333
  14. Mehmood, K., Bao, Y., Saifullah, Cheng, W., Khan, M. A., Siddique, N., Abrar, M. M., Soban, A., Fahad, S., & Naidu, R. (2022). Predicting the quality of air with machine learning approaches: Current research priorities and future perspectives. Journal of Cleaner Production, 379, 134656. https://doi.org/10.1016/j.jclepro.2022.134656
  15. Liu, D., Lee, S., Huang, Y., & Chiu, C. (2020). Air pollution forecasting based on attention-based LSTM neural network and ensemble learning. Expert Syst., 37(3), 1–12. https://doi.org/10.1111/exsy.12511
  16. Xu, X., Tong, T., Zhang, W., & Meng, L. (2020). Fine-grained prediction of PM2.5 concentration based on multisource data and deep learning. Atmospheric Pollution Research, 11(10), 1728–1737. https://doi.org/10.1016/j.apr.2020.06.032
  17. Yang, Y., Mei, G., & Izzo, S. (2022). Revealing influence of meteorological conditions on air quality prediction using explainable deep learning. IEEE Access, 10, 50755–50773. https://doi.org/10.1109/ACCESS.2022.3173734
  18. Olenych, I., & Babiak, S. (2024). Automated air pollution research system. Electronics and information technologies, 26, 59–72, (in Ukrainian). https://doi.org/10.30970/eli.26.6
  19. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Communications Surveys & Tutorials, 17(4), 2347–2376. https://doi.org/10.1109/COMST.2015.2444095

View full text (PDF)