Air pollution prediction using machine learning

Search by:

Year of publication

Author name

Paper title

https://doi.org/10.15407/jai2025.01.141

Air pollution prediction using machine learning

Olenych I.¹, Demchyk D.¹, Babiak S.¹, Futey O.¹

¹ Ivan Franko Lviv National University

igor.olenych@lnu.edu.ua; d.demchik15@gmail.com; frageks@gmail.com; oleksandr.futey@lnu.edu.ua

https://orcid.org/0000-0002-6642-0222 https://orcid.org/0009-0000-0495-2939 https://orcid.org/0009-0002-8726-2742 https://orcid.org/0000-0002-6491-1669

Full text (PDF)

UDC: 004.89
Publication Language: English
Stuc. intelekt. 2025; 30(1):141-146

Abstract: Prediction of air pollution with particulate matter is a critically important task for developing effective strategies to improve the environmental situation. Despite the large number of predictive machine learning models, insufficient attention has been paid to investigating the effectiveness of pollution prediction in different ranges of microparticle concentrations. The paper proposes models for forecasting atmospheric pollution with particulate matter up to 2.5 microns in size (PM2.5) based on the Long Short-Term Memory (LSTM), Extreme Gradient Boosting (XGBoost), and Random Forest algorithms taking into account meteorological and spatio-temporal data obtained by the developed air quality monitoring system. Particular attention was focused on studying the dependence of forecasting accuracy on the level of atmospheric pollution. It was found that the proposed models successfully predict the PM2.5 content in the air at low and medium levels of pollution but underestimate the predicted values as their concentration increases. Based on the analysis of the concentration dependences of absolute and relative errors, it was found that the Random Forest method demonstrates the highest prediction accuracy in a wide range of the PM2.5 concentration with a relative error of 6–9 % despite deviations for some peak values. Models based on the XGBoost and LSTM methods are characterized by errors of 9–11 and 11–14 %, respectively. A decrease in forecast accuracy and a significant increase in the variance of predicted values were found with an increase in the concentration of the particulate matter in the air. The LSTM method demonstrates the worst results for high levels of air pollution. The decrease in the effectiveness of predictive models with increasing atmospheric pollution may be due to the small number of records with a high concentration of particulate matter in the dataset and the random appearance of additional pollution sources unrelated to meteorological conditions and spatio-temporal characteristics. An integral assessment of the accuracy of the developed models using the metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), and the coefficient of determination R² confirms the high efficiency of predicting the PM2.5 concentration in the air.

Keywords: air pollution, forecasting, model efficiency, machine learning, artificial neural networks.

References:

Air quality in the world [Electronic resource]. - Mode of access: https://www.iqair.com/world-air-quality
Xing, Y. F., Xu, Y. H., Shi, M. H., & Lian, Y. X. (2016). The impact of PM2.5 on the human respiratory system. Journal of Thoracic Disease, 8(1), E69–74. https://doi.org/10.3978/j.issn.2072-1439.2016.01.19
Vargas, J. E., Kubesch, N., Hernandéz-Ferrer, C., Carrasco-Turigas, G., Bustamante, M., Nieuwenhuijsen, M., & González, J. R. (2018). A systemic approach to identify signaling pathways activated during short-term exposure to traffic-related urban air pollution from human blood. Environ. Sci. Pollut. Res., 25(29), 29572–29583. https://doi.org/10.1007/s11356-018-3009-8
Kan, H., Chen, R., & Tong, S. (2012). Ambient air pollution, climate change, and population health in China. Environment International, 42, 10–19. https://doi.org/10.1016/j.envint.2011.03.003
Kunzli, N., Jerrett, M., Mack, W. J., Beckerman, B., LaBree, L., Gilliland, F., Thomas, D., Peters, J., & Hodis, H. N. (2005). Ambient air pollution and atherosclerosis in Los Angeles. Environ. Health Perspect., 113, 201–206. https://doi.org/10.1289/ehp.7523
Cieplak, T., Rymarczyk, T., & Tomaszewsk, R. (2019). A concept of the air quality monitoring system in the city of Lublin with machine learning methods to detect data outliers. MATEC Web of Conferences, 252, 03009. https://doi.org/10.1051/matecconf/201925203009
O'Leary, B., Reiners, J. J. Jr., Xu, X., & Lemke, L. D. (2016). Identification and influence of spatio-temporal outliers in urban air quality measurements. Science of the Total Environment, 573, 55–65. https://doi.org/10.1016/j.scitotenv.2016.08.031
Rukmani, P., Teja, G. K., & Vinay, M. S. (2018). Industrial monitoring using image processing, IoT and analyzing the sensor values using big data. Procedia Computer Science, 133, 991–997. https://doi.org/10.1016/j.procs.2018.07.077
Shankar, L., & Arasu, K. (2023). Deep learning techniques for air quality prediction: a focus on PM2.5 and periodicity. Migration Letters, 20(S13), 468–484. https://doi.org/10.59670/ml.v20iS13.6477
Kalajdjieski, J., Trivodaliev, K., Mirceva, G., Kalajdziski, S., & Gievska, S. (2023). A complete air pollution monitoring and prediction framework. IEEE Access, 11, 88730–88744. https://doi.org/10.1109/ACCESS.2023.3251346
Mokhtari, I., Bechkit, W., Rivano, H., & Yaici, M. R. (2021). Uncertainty-aware deep learning architectures for highly dynamic air quality prediction. IEEE Access, 9, 14765–14778. https://doi.org/10.1109/ACCESS.2021.3052429
Doreswamy, N., Harishkumar, K. S., Yogesh, K. M., & Gad, I. (2020). Forecasting air pollution particulate matter (PM2.5) using machine learning regression models. Procedia Computer Science, 171, 2057–2066. https://doi.org/10.1016/j.procs.2020.04.221
Qi, Y., Li, Q., Karimian, H., & Liu, D. (2019). A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. Science of the Total Environment, 664, 1–10. https://doi.org/10.1016/j.scitotenv.2019.01.333
Mehmood, K., Bao, Y., Saifullah, Cheng, W., Khan, M. A., Siddique, N., Abrar, M. M., Soban, A., Fahad, S., & Naidu, R. (2022). Predicting the quality of air with machine learning approaches: Current research priorities and future perspectives. Journal of Cleaner Production, 379, 134656. https://doi.org/10.1016/j.jclepro.2022.134656
Liu, D., Lee, S., Huang, Y., & Chiu, C. (2020). Air pollution forecasting based on attention-based LSTM neural network and ensemble learning. Expert Syst., 37(3), 1–12. https://doi.org/10.1111/exsy.12511
Xu, X., Tong, T., Zhang, W., & Meng, L. (2020). Fine-grained prediction of PM2.5 concentration based on multisource data and deep learning. Atmospheric Pollution Research, 11(10), 1728–1737. https://doi.org/10.1016/j.apr.2020.06.032
Yang, Y., Mei, G., & Izzo, S. (2022). Revealing influence of meteorological conditions on air quality prediction using explainable deep learning. IEEE Access, 10, 50755–50773. https://doi.org/10.1109/ACCESS.2022.3173734
Olenych, I., & Babiak, S. (2024). Automated air pollution research system. Electronics and information technologies, 26, 59–72, (in Ukrainian). https://doi.org/10.30970/eli.26.6
Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Communications Surveys & Tutorials, 17(4), 2347–2376. https://doi.org/10.1109/COMST.2015.2444095

View full text (PDF)

Artificial intelligence

Scientific journal

Search by:

Air pollution prediction using machine learning