Artificial intelligence

Scientific journal

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Select your language


Pruning of Convolutional Neural Networks Using Interpretability of Kolmogorov-Arnold Networks

Yefanov I.1, Shapoval N.1
1 National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»
efanov.illiya@lll.kpi.ua; shovgun@gmail.com

Full text (PDF)

UDC: 004.93
Publication Language: English
Stuc. intelekt. 2025; 30(4):100-106

Abstract: The paper addresses the pressing scientific and technical problem of over-parameterization in deep Convolutional Neural Networks (CNNs), which hinders their effective deployment in Edge AI systems. Traditional compression methods, such as magnitude-based pruning, often remove functionally important components as they fail to account for the semantic contribution of filters to the final decision. This study proposes a novel structural pruning method leveraging the interpretability of Kolmogorov-Arnold Networks (KAN). A hybrid CNN-KAN architecture is developed, where the KAN layer acts as an "interpretable bottleneck," enabling the analysis of convolutional feature importance through learned B-spline coefficients. A mathematical importance criterion based on the maximum weighted L2-norm of spline coefficients is formalized. An iterative pruning algorithm with adaptive fine-tuning is developed. Experimental studies on the CIFAR-10 dataset demonstrate that the proposed method achieves a compression ratio of 1.33× (reducing parameters from 11.2 million to 8.4 million) while maintaining an accuracy of 90.68%. This result outperforms classical pruning by 1.56% and is competitive with knowledge distillation methods.

Keywords: neural networks, pruning, Kolmogorov-Arnold Networks, KAN, model compression, interpretability, B-splines, Edge AI.

References:

  1. Liu, Z., Wang, Y., Vaidya, S., et al. (2024). KAN: Kolmogorov-Arnold Networks. arXiv preprint arXiv:2404.19756.
  2. Blealtan. (2024). efficient-kan: An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network. GitHub repository. https://github.com/Blealtan/efficient-kan
  3. Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk SSSR, 114(5), 953-956.
  4. Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural networks. In Advances in Neural Information Processing Systems (pp. 1135-1143).
  5. Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations.
  6. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  7. LeCun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In Advances in Neural Information Processing Systems (pp. 598-605).
  8. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).
  9. De Boor, C. (1972). On calculating with B-splines. Journal of Approximation Theory, 6(1), 50-62.
  10. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).
  11. Blalock, D., Ortiz, J. J. G., Frankle, J., & Guttag, J. (2020). What is the state of neural network pruning? Proceedings of Machine Learning and Systems, 2, 129-146.
  12. Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge distillation: A survey. International Journal of Computer Vision, 129, 1789-1819.
  13. Paszke, A., Gross, S., Massa, F., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (pp. 8026-8037).
  14. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR).
  15. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958.
  16. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4), 303-314.
  17. Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.

View full text (PDF)