Search by:
Pruning of Convolutional Neural Networks Using Interpretability of Kolmogorov-Arnold Networks
Full text (PDF)
UDC: 004.93
Publication Language: English
Stuc. intelekt. 2025; 30(4):100-106
Abstract: The paper addresses the pressing scientific and technical problem of over-parameterization in deep Convolutional Neural Networks (CNNs), which hinders their effective deployment in Edge AI systems. Traditional compression methods, such as magnitude-based pruning, often remove functionally important components as they fail to account for the semantic contribution of filters to the final decision. This study proposes a novel structural pruning method leveraging the interpretability of Kolmogorov-Arnold Networks (KAN). A hybrid CNN-KAN architecture is developed, where the KAN layer acts as an "interpretable bottleneck," enabling the analysis of convolutional feature importance through learned B-spline coefficients. A mathematical importance criterion based on the maximum weighted L2-norm of spline coefficients is formalized. An iterative pruning algorithm with adaptive fine-tuning is developed. Experimental studies on the CIFAR-10 dataset demonstrate that the proposed method achieves a compression ratio of 1.33× (reducing parameters from 11.2 million to 8.4 million) while maintaining an accuracy of 90.68%. This result outperforms classical pruning by 1.56% and is competitive with knowledge distillation methods.
Keywords: neural networks, pruning, Kolmogorov-Arnold Networks, KAN, model compression, interpretability, B-splines, Edge AI.
References:
- Liu, Z., Wang, Y., Vaidya, S., et al. (2024). KAN: Kolmogorov-Arnold Networks. arXiv preprint arXiv:2404.19756.
- Blealtan. (2024). efficient-kan: An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network. GitHub repository. https://github.com/Blealtan/efficient-kan
- Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk SSSR, 114(5), 953-956.
- Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural networks. In Advances in Neural Information Processing Systems (pp. 1135-1143).
- Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations.
- Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- LeCun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In Advances in Neural Information Processing Systems (pp. 598-605).
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).
- De Boor, C. (1972). On calculating with B-splines. Journal of Approximation Theory, 6(1), 50-62.
- Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).
- Blalock, D., Ortiz, J. J. G., Frankle, J., & Guttag, J. (2020). What is the state of neural network pruning? Proceedings of Machine Learning and Systems, 2, 129-146.
- Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge distillation: A survey. International Journal of Computer Vision, 129, 1789-1819.
- Paszke, A., Gross, S., Massa, F., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (pp. 8026-8037).
- Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR).
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958.
- Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4), 303-314.
- Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.