Шукати за:
Удосконалення регресійного прогнозування за допомогою гібридних ансамблево-нейромережевих моделей
Повний текст (PDF)
УДК: 004.9
Мова публікації: Англійська
Stuc. intelekt. 2025; 30; (2):96-103
Анотація: In regression forecasting problems based on large-scale and noisy datasets, there is often a need to choose between classical machine learning algorithms and modern neural network methods. Classical methods are simpler and more interpretable, while neural networks are better at handling heterogeneous and high-dimensional data, although they require more resources and more difficult fine-tuning. This paper presents a comparative analysis of the Random Forest (RF), XGBoosting, and Dense Neural Network (DNN) regression models for processing large tabular datasets. In particular, the IMDb dataset from the Kaggle platform was analyzed. Special attention was focused on studying the possibility of improving the performance of the prediction by combining RF and XGBoosting ensemble methods with DNN models. It was found that the RF model demonstrated acceptable predictive quality, namely, a coefficient of determination (R²) was 0.8640. The XGBoosting-based model showed a considerably better result, with an R² of 0.9245. The basic DNN model was characterized by the R² value of 0.8990. After optimizing the hyperparameters of the DNN model, the R² increased to 0.9179. A hybrid approach has been proposed as an additional way to improve the effectiveness of the DNN model. In particular, the distributions of features according to their impact on the prediction accuracy determined by the RF and XGBoosting methods were used as weighting coefficients for the DNN model feature vector. As a result, the most accurate forecast was obtained. The coefficients of determination R² were 0.9283 and 0.9302 for the RF-DNN and XGBoosting-DNN hybrid models, respectively. The obtained results can be used to develop predictive models based on heterogeneous and high-dimensional tabular data.
Ключові слова: forecasting, model efficiency, machine learning, ensemble methods, dense neural networks, feature engineering
Посилання:
- Bjerre, L.M., Peixoto, C., Alkurd, R., Talarico, R., & Abielmona, R. (2024). Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction. Global Epidemiology, 8, 100168. https://doi.org/10.1016/j.gloepi.2024.100168
- Ha, S., Park, J. & Jo, K, (2025) Comparative analysis of regression algorithms for drug response prediction using GDSC dataset. BMC Res Notes, 18 (Suppl 1), 10. https://doi.org/10.1186/s13104-024-07026-w
- Olenych, I., Demchyk, D., Babiak, S., Futey O. (2025). Air pollution prediction using machine learning. Artificial Intelligence, No 1(102), 141–146. https://doi.org/10.15407/jai2025.01.141
- Shivashankar, S. K., Prajwal, M. D., Likith Raj, K. R., Priyadarshini, T. A. R., & Manvitha, S. M. (2024). Forest fire prediction using random forest regressor: A comprehensive machine learning approach. International Journal of Innovative Science and Research Technology, 9(9), 2063–2071. https://doi.org/10.38124/ijisrt/IJISRT24SEP1290
- Sharma, A.K., Li, LH., & Ahmad, R. (2023). Default Risk Prediction Using Random Forest and XGBoosting Classifier. 2021 International Conference on Security and Information Technologies with AI, Internet Computing and Big-data Applications. Smart Innovation, Systems and Technologies, 314. https://doi.org/10.1007/978-3-031-05491-4_10
- Bhattacharya, S., Liu, Z., & Maiti, T. (2024). Comprehensive study of variational Bayes classification for dense deep neural networks. Statistics and Computing, 34, 17. https://doi.org/10.1007/s11222-023-10338-9
- Hegde, R. S. (2019). Deep neural network (DNN) surrogate models for the accelerated design of optical devices and systems: moving beyond fully-connected feed forward architectures. Proc. SPIE, 11105, 1110508. https://doi.org/10.1117/12.2528380
- Elsayed, А., Levison, J., Binns, A., Larocque, M., & Goel, P. (2025). Regression-based machine learning models for nitrate and chloride prediction in surface water in a small agricultural sand plain sub-watershed in southwestern Ontario, Canada. Front. Environ. Sci. 13, 1543852. https://doi.org/10.3389/fenvs.2025.1543852
- Li, H., Rajbahadur, G. K., Lin, D., Bezemer, C.-P., & Jiang, Z. M. (2024). Keeping deep learning models in check: A history-based approach to mitigate overfitting. IEEE Access, 12, 70676–70689. https://doi.org/10.1109/ACCESS.2024.3402543
- Ha, S., Jeong, S., & Lee, J. (2024). Domain-aware fine-tuning: Enhancing neural network adaptability. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12261–12269. https://doi.org/10.1609/aaai.v38i11.29116
- Chen, C.-H., Lai, J.-P., Chang, Y.-M., Lai, C.-J., & Pai, P.-F. (2023). A Study of Optimization in Deep Neural Networks for Regression. Electronics, 12(14), 3071. https://doi.org/10.3390/electronics12143071
- Khan, M. A., Azim, A., Liscano, R., Smith, K., Chang, Y.-K., Seferi, G., & Tauseef Q. (2024). On the effectiveness of feature selection techniques in the context of ML-based regression test prioritization. IEEE Access, 12, 131556–131575. https://doi.org/10.1109/ACCESS.2024.3459656
- IMDb Top 5000 Movies [Electronic resource]. - Mode of access: https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-movies/data
- Xu, Y. (2025). Deep regularization techniques for improving robustness in noisy record linkage task. Advances in Engineering Innovation, 15, 9–13. https://doi.org/10.54254/2977-3903/2025.20435
- Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C. B., & Goldstein, T. (2021). SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training. Advances in Neural Information Processing Systems, 34, 11237–11250. https://doi.org/10.48550/arXiv.2106.01342
- Zhang, Y., Xiong, F., Xie, Y., Fan, X., & Gu H. (2020). The impact of artificial intelligence and blockchain on the accounting profession. IEEE Access, 8, 110461–110477. https://doi.org/10.1109/ACCESS.2020.3000505
- Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. Springer. https://doi.org/10.1007/978-1-4614-6849-3
- Altman, N., & Krzywinski, M. (2018). The curse(s) of dimensionality. Nature Methods, 15, 399–400. https://doi.org/10.1038/s41592-018-0019-x
- Tiep, N. H., Jeong, H.-Y., Kim, K.-D., Xuan Mung, N., Dao, N.-N., Tran, H.-N., Hoang, V.-K., Ngoc Anh, N., & Vu, M. T. (2024). A New Hyperparameter Tuning Framework for Regression Tasks in Deep Neural Network: Combined-Sampling Algorithm to Search the Optimized Hyperparameters. Mathematics, 12(24), 3892. https://doi.org/10.3390/math12243892