Дослідження двоетапної дистиляції знань та гібридної компресії згорткових нейронних мереж для зменшення їх обчислювальної складності

Шукати за:

Роком видання

Автором

Назвою статті

https://doi.org/10.15407/jai2026.01.059

Дослідження двоетапної дистиляції знань та гібридної компресії згорткових нейронних мереж для зменшення їх обчислювальної складності

Козинець А.Ю.¹, Демків Л.С.¹

¹ Львівський національний університет імені Івана Франка

andrian.kozynets@gmail.com; lidia.demkiv@gmail.com

https://orcid.org/0009-0004-7994-3534 https://orcid.org/0009-0002-0185-6364

Повний текст (PDF)

УДК: 004.032.26:004.89:004.93
Мова публікації: Англійська
Stuc. intelekt. 2026; 31; (1):59-69

Анотація: The article considers the problem of effective knowledge transfer between deep convolutional neural networks of different capacities for their further deployment on hardware platforms with limited computing resources. The main problem of standard knowledge distillation protocols is the occurrence of "gradient shock" during the initialization of the student model on specific data sets, which leads to the destruction of the feature space and the loss of final accuracy. To overcome this limitation, the Two-Stage Distillation algorithm was developed and implemented. The proposed approach divides the learning process into the classifier stabilization phase and the deep distillation phase. Experimental research was conducted on the ResNet and VGG architectural families. The results obtained confirm that the use of the proposed algorithm allows to increase the accuracy of compact models by 1.5–2.6% compared to standard training. In addition, the work investigated and experimentally confirmed the phenomenon of "distillation recovery" - the ability of the algorithm to restore the accuracy of the model after aggressive structural pruning. It is proven that the use of "soft goals" of the teacher in a narrowed search space allows the sparse ResNet18 model to achieve an accuracy of 77.8%, which exceeds the basic full-size student model.

Ключові слова: knowledge distillation, neural networks, gradient shock, model compression, ResNet, VGG

Посилання:

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICLR). [Online]. Available: http://arxiv.org/abs/1409.1556
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. doi: 10.1109/CVPR.2016.90. [Online]. Available: https://doi.org/10.1109/CVPR.2016.90
Deng, L., Li, G., Han, S., Shi, L., & Xie, Y. (2020). Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proceedings of the IEEE, 108(4), 485–532. doi: 10.1109/JPROC.2020.2976475. [Online]. Available: https://doi.org/10.1109/JPROC.2020.2976475
Park, W., Kim, D., Lu, Y., & Cho, M. (2019). Relational Knowledge Distillation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3967–3976. doi: 10.1109/CVPR.2019.00409. [Online]. Available: https://doi.org/10.1109/CVPR.2019.00409
Mirzadeh, S. I., Farajtabar, M., Li, A., & Ghasemzadeh, H. (2020). Improved Knowledge Distillation via Teacher Assistant. AAAI Conference on Artificial Intelligence, 34(04), 5191–5198. doi: 10.1609/aaai.v34i04.5963. [Online]. Available: https://doi.org/10.1609/aaai.v34i04.5963
Blalock, D., Gonzalez Ortiz, J. J., Frankle, J., & Guttag, J. (2020). What is the State of Neural Network Pruning? Proceedings of Machine Learning and Systems (MLSys), 2, 129–146. [Online]. Available: https://proceedings.mlsys.org/paper/2020/file/d2ddea18f00665ce8623e36bd4e3c7c5-Paper.pdf
Liu, Z., Sun, M., Zhou, T., Huang, G., & Darrell, T. (2021). Rethinking the Value of Network Pruning. International Conference on Learning Representations (ICLR). [Online]. Available: https://openreview.net/forum?id=rJlnB3C5Ym
Menon, A. K., Rawat, A. S., Reddi, S. J., & Kumar, S. (2021). Why Distillation Helps: A Statistical Perspective. International Conference on Machine Learning (ICML), 139, 7651–7662. [Online]. Available: http://proceedings.mlr.press/v139/menon21a.html
Stanton, S., Izmailov, P., Kirichenko, P., Alemi, A. A., & Wilson, A. G. (2021). Does Knowledge Distillation Really Work? Advances in Neural Information Processing Systems (NeurIPS), 34, 6906–6919. [Online]. Available: https://doi.org/10.48550/arXiv.2106.05945
Frankle, J., & Carbin, M. (2019). The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. International Conference on Learning Representations (ICLR). [Online]. Available: https://doi.org/10.48550/arXiv.1803.03635
Zagoruyko, S., & Komodakis, N. (2017). Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. International Conference on Learning Representations (ICLR). [Online]. Available: https://arxiv.org/abs/1612.03928
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. NIPS Deep Learning Workshop. [Online]. Available: http://arxiv.org/abs/1503.02531

Переглянути повний текст статті (PDF)

Штучний інтелект

Науковий журнал

Шукати за:

Дослідження двоетапної дистиляції знань та гібридної компресії згорткових нейронних мереж для зменшення їх обчислювальної складності