Штучний інтелект

Науковий журнал

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Виберіть свою мову


Дослідження двоетапної дистиляції знань та гібридної компресії згорткових нейронних мереж для зменшення їх обчислювальної складності

Козинець А.Ю.1, Демків Л.С.1
1 Львівський національний університет імені Івана Франка
andrian.kozynets@gmail.com; lidia.demkiv@gmail.com

Повний текст (PDF)

УДК: 004.032.26:004.89:004.93
Мова публікації: Англійська
Stuc. intelekt. 2026; 31; (1):59-69

Анотація: The article considers the problem of effective knowledge transfer between deep convolutional neural networks of different capacities for their further deployment on hardware platforms with limited computing resources. The main problem of standard knowledge distillation protocols is the occurrence of "gradient shock" during the initialization of the student model on specific data sets, which leads to the destruction of the feature space and the loss of final accuracy. To overcome this limitation, the Two-Stage Distillation algorithm was developed and implemented. The proposed approach divides the learning process into the classifier stabilization phase and the deep distillation phase. Experimental research was conducted on the ResNet and VGG architectural families. The results obtained confirm that the use of the proposed algorithm allows to increase the accuracy of compact models by 1.5–2.6% compared to standard training. In addition, the work investigated and experimentally confirmed the phenomenon of "distillation recovery" - the ability of the algorithm to restore the accuracy of the model after aggressive structural pruning. It is proven that the use of "soft goals" of the teacher in a narrowed search space allows the sparse ResNet18 model to achieve an accuracy of 77.8%, which exceeds the basic full-size student model.

Ключові слова: knowledge distillation, neural networks, gradient shock, model compression, ResNet, VGG

Посилання:

  1. Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICLR). [Online]. Available: http://arxiv.org/abs/1409.1556
  2. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. doi: 10.1109/CVPR.2016.90. [Online]. Available: https://doi.org/10.1109/CVPR.2016.90
  3. Deng, L., Li, G., Han, S., Shi, L., & Xie, Y. (2020). Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proceedings of the IEEE, 108(4), 485–532. doi: 10.1109/JPROC.2020.2976475. [Online]. Available: https://doi.org/10.1109/JPROC.2020.2976475
  4. Park, W., Kim, D., Lu, Y., & Cho, M. (2019). Relational Knowledge Distillation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3967–3976. doi: 10.1109/CVPR.2019.00409. [Online]. Available: https://doi.org/10.1109/CVPR.2019.00409
  5. Mirzadeh, S. I., Farajtabar, M., Li, A., & Ghasemzadeh, H. (2020). Improved Knowledge Distillation via Teacher Assistant. AAAI Conference on Artificial Intelligence, 34(04), 5191–5198. doi: 10.1609/aaai.v34i04.5963. [Online]. Available: https://doi.org/10.1609/aaai.v34i04.5963
  6. Blalock, D., Gonzalez Ortiz, J. J., Frankle, J., & Guttag, J. (2020). What is the State of Neural Network Pruning? Proceedings of Machine Learning and Systems (MLSys), 2, 129–146. [Online]. Available: https://proceedings.mlsys.org/paper/2020/file/d2ddea18f00665ce8623e36bd4e3c7c5-Paper.pdf
  7. Liu, Z., Sun, M., Zhou, T., Huang, G., & Darrell, T. (2021). Rethinking the Value of Network Pruning. International Conference on Learning Representations (ICLR). [Online]. Available: https://openreview.net/forum?id=rJlnB3C5Ym
  8. Menon, A. K., Rawat, A. S., Reddi, S. J., & Kumar, S. (2021). Why Distillation Helps: A Statistical Perspective. International Conference on Machine Learning (ICML), 139, 7651–7662. [Online]. Available: http://proceedings.mlr.press/v139/menon21a.html
  9. Stanton, S., Izmailov, P., Kirichenko, P., Alemi, A. A., & Wilson, A. G. (2021). Does Knowledge Distillation Really Work? Advances in Neural Information Processing Systems (NeurIPS), 34, 6906–6919. [Online]. Available: https://doi.org/10.48550/arXiv.2106.05945
  10. Frankle, J., & Carbin, M. (2019). The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. International Conference on Learning Representations (ICLR). [Online]. Available: https://doi.org/10.48550/arXiv.1803.03635
  11. Zagoruyko, S., & Komodakis, N. (2017). Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. International Conference on Learning Representations (ICLR). [Online]. Available: https://arxiv.org/abs/1612.03928
  12. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. NIPS Deep Learning Workshop. [Online]. Available: http://arxiv.org/abs/1503.02531

Переглянути повний текст статті (PDF)