Удосконалення регресійного прогнозування за допомогою гібридних ансамблево-нейромережевих моделей

Шукати за:

Роком видання

Автором

Назвою статті

https://doi.org/10.15407/jai2025.02.096

Удосконалення регресійного прогнозування за допомогою гібридних ансамблево-нейромережевих моделей

Хамар І.О.¹, Оленич І.Б.¹

¹ Львівський національний університет імені Івана Франка

ivan.khamar@lnu.edu.ua; igor.olenych@lnu.edu.ua

https://orcid.org/0009-0000-0514-903X https://orcid.org/0000-0002-6642-0222

Повний текст (PDF)

УДК: 004.9
Мова публікації: Англійська
Stuc. intelekt. 2025; 30; (2):96-103

Анотація: In regression forecasting problems based on large-scale and noisy datasets, there is often a need to choose between classical machine learning algorithms and modern neural network methods. Classical methods are simpler and more interpretable, while neural networks are better at handling heterogeneous and high-dimensional data, although they require more resources and more difficult fine-tuning. This paper presents a comparative analysis of the Random Forest (RF), XGBoosting, and Dense Neural Network (DNN) regression models for processing large tabular datasets. In particular, the IMDb dataset from the Kaggle platform was analyzed. Special attention was focused on studying the possibility of improving the performance of the prediction by combining RF and XGBoosting ensemble methods with DNN models. It was found that the RF model demonstrated acceptable predictive quality, namely, a coefficient of determination (R²) was 0.8640. The XGBoosting-based model showed a considerably better result, with an R² of 0.9245. The basic DNN model was characterized by the R² value of 0.8990. After optimizing the hyperparameters of the DNN model, the R² increased to 0.9179. A hybrid approach has been proposed as an additional way to improve the effectiveness of the DNN model. In particular, the distributions of features according to their impact on the prediction accuracy determined by the RF and XGBoosting methods were used as weighting coefficients for the DNN model feature vector. As a result, the most accurate forecast was obtained. The coefficients of determination R² were 0.9283 and 0.9302 for the RF-DNN and XGBoosting-DNN hybrid models, respectively. The obtained results can be used to develop predictive models based on heterogeneous and high-dimensional tabular data.

Ключові слова: forecasting, model efficiency, machine learning, ensemble methods, dense neural networks, feature engineering

Посилання:

Bjerre, L.M., Peixoto, C., Alkurd, R., Talarico, R., & Abielmona, R. (2024). Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction. Global Epidemiology, 8, 100168. https://doi.org/10.1016/j.gloepi.2024.100168
Ha, S., Park, J. & Jo, K, (2025) Comparative analysis of regression algorithms for drug response prediction using GDSC dataset. BMC Res Notes, 18 (Suppl 1), 10. https://doi.org/10.1186/s13104-024-07026-w
Olenych, I., Demchyk, D., Babiak, S., Futey O. (2025). Air pollution prediction using machine learning. Artificial Intelligence, No 1(102), 141–146. https://doi.org/10.15407/jai2025.01.141
Shivashankar, S. K., Prajwal, M. D., Likith Raj, K. R., Priyadarshini, T. A. R., & Manvitha, S. M. (2024). Forest fire prediction using random forest regressor: A comprehensive machine learning approach. International Journal of Innovative Science and Research Technology, 9(9), 2063–2071. https://doi.org/10.38124/ijisrt/IJISRT24SEP1290
Sharma, A.K., Li, LH., & Ahmad, R. (2023). Default Risk Prediction Using Random Forest and XGBoosting Classifier. 2021 International Conference on Security and Information Technologies with AI, Internet Computing and Big-data Applications. Smart Innovation, Systems and Technologies, 314. https://doi.org/10.1007/978-3-031-05491-4_10
Bhattacharya, S., Liu, Z., & Maiti, T. (2024). Comprehensive study of variational Bayes classification for dense deep neural networks. Statistics and Computing, 34, 17. https://doi.org/10.1007/s11222-023-10338-9
Hegde, R. S. (2019). Deep neural network (DNN) surrogate models for the accelerated design of optical devices and systems: moving beyond fully-connected feed forward architectures. Proc. SPIE, 11105, 1110508. https://doi.org/10.1117/12.2528380
Elsayed, А., Levison, J., Binns, A., Larocque, M., & Goel, P. (2025). Regression-based machine learning models for nitrate and chloride prediction in surface water in a small agricultural sand plain sub-watershed in southwestern Ontario, Canada. Front. Environ. Sci. 13, 1543852. https://doi.org/10.3389/fenvs.2025.1543852
Li, H., Rajbahadur, G. K., Lin, D., Bezemer, C.-P., & Jiang, Z. M. (2024). Keeping deep learning models in check: A history-based approach to mitigate overfitting. IEEE Access, 12, 70676–70689. https://doi.org/10.1109/ACCESS.2024.3402543
Ha, S., Jeong, S., & Lee, J. (2024). Domain-aware fine-tuning: Enhancing neural network adaptability. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12261–12269. https://doi.org/10.1609/aaai.v38i11.29116
Chen, C.-H., Lai, J.-P., Chang, Y.-M., Lai, C.-J., & Pai, P.-F. (2023). A Study of Optimization in Deep Neural Networks for Regression. Electronics, 12(14), 3071. https://doi.org/10.3390/electronics12143071
Khan, M. A., Azim, A., Liscano, R., Smith, K., Chang, Y.-K., Seferi, G., & Tauseef Q. (2024). On the effectiveness of feature selection techniques in the context of ML-based regression test prioritization. IEEE Access, 12, 131556–131575. https://doi.org/10.1109/ACCESS.2024.3459656
IMDb Top 5000 Movies [Electronic resource]. - Mode of access: https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-movies/data
Xu, Y. (2025). Deep regularization techniques for improving robustness in noisy record linkage task. Advances in Engineering Innovation, 15, 9–13. https://doi.org/10.54254/2977-3903/2025.20435
Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C. B., & Goldstein, T. (2021). SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training. Advances in Neural Information Processing Systems, 34, 11237–11250. https://doi.org/10.48550/arXiv.2106.01342
Zhang, Y., Xiong, F., Xie, Y., Fan, X., & Gu H. (2020). The impact of artificial intelligence and blockchain on the accounting profession. IEEE Access, 8, 110461–110477. https://doi.org/10.1109/ACCESS.2020.3000505
Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. Springer. https://doi.org/10.1007/978-1-4614-6849-3
Altman, N., & Krzywinski, M. (2018). The curse(s) of dimensionality. Nature Methods, 15, 399–400. https://doi.org/10.1038/s41592-018-0019-x
Tiep, N. H., Jeong, H.-Y., Kim, K.-D., Xuan Mung, N., Dao, N.-N., Tran, H.-N., Hoang, V.-K., Ngoc Anh, N., & Vu, M. T. (2024). A New Hyperparameter Tuning Framework for Regression Tasks in Deep Neural Network: Combined-Sampling Algorithm to Search the Optimized Hyperparameters. Mathematics, 12(24), 3892. https://doi.org/10.3390/math12243892

Переглянути повний текст статті (PDF)

Штучний інтелект

Науковий журнал

Шукати за:

Удосконалення регресійного прогнозування за допомогою гібридних ансамблево-нейромережевих моделей