Напівкероване навчання зорового трансформера для задачі сегментації дорожнього трафіку в неструктурованому середовищі

Шукати за:

Роком видання

Автором

Назвою статті

https://doi.org/10.15407/jai2024.04.133

Напівкероване навчання зорового трансформера для задачі сегментації дорожнього трафіку в неструктурованому середовищі

Шабо О.А.¹, Шаповал Н.¹

¹ Національний технічний університет України «Київський політехнічний інститут імені Ігоря Сікорського»

andriyshabo@gmail.com; shovgun@gmail.com

https://orcid.org/0000-0002-8509-6886

Повний текст (PDF)

УДК: 004.93
Мова публікації: Англійська
Stuc. intelekt. 2024; 29; (4):133-140

Анотація: In the last few years, traditionally used for natural language processing tasks, recurrent neural networks have been replaced mainly by transformers. Thanks to the novel attention mechanism, they also sequentially receive text input but provide much better results than LSTM, GRU-based, or similar networks. Self-attention negates the problem of fading memory by allowing efficient evaluation of dependencies between distant tokens and provides a better means for parallelization for modern processing units like GPU. Until recently, the use of transformers for computer vision (CV) tasks was minimal. The biggest obstacles that hindered the progress in this field were immense computational complexity, the fact that the image is a grid, not a sequence-like text, and the lack of strong inductive bias, in other words, the ability to have a good grasp of local correlations, unlike their CNN counterparts. The latest slowed down the vision transformer (ViT) usage rate in semantic segmentation (SS) even more. However, it was recently shown that with sufficient data, Transformers could outperform CNN-based networks in image classification and, with the proper ViT structure, even in SS. A promising direction for providing a ViT with required training data is using semi-supervised learning (SSL), which allows for extracting helpful information from unlabeled data using only a small amount of labeled data. This approach is beneficial when solving the problem of SS since manually creating masks for images is very time-consuming. This paper proposes the robust semi-supervised ViT learning method using minimal labeled data. The combination of a strong augmentation pipeline and a dual teacher paradigm allows good performance for SS of road traffic in the unstructured environment without the need for extensive hyperparameter search.

Ключові слова: semi-supervised learning, vision transformer, semantic segmentation, unstructured environment

Посилання:

Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in Vision: A Survey. ACM Computing Surveys. https://doi.org/10.1145/3505244
Cai, Z., Ravichandran, A., Favaro, P., Wang, M., Modolo, D., Bhotika, R., Tu, Z., & Soatto, S. (2022). Semi-supervised Vision Transformers at Scale. arXiv. https://arxiv.org/pdf/2208.05688
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv48922.2021.00986
Cai, H., Li, J., Hu, M., Gan, C., & Han, S. (2023). EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv51070.2023.01587
Csurka, G., Volpi, R., & Chidlovskii, B. (2022). Semantic Image Segmentation: Two Decades of Research. Foundations and Trends® in Computer Graphics and Vision, 14(1-2), 1–162. https://doi.org/10.1561/0600000095
Chen, Y., Mancini, M., Zhu, X., & Akata, Z. (2022). Semi-Supervised and Unsupervised Deep Visual Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–23. https://doi.org/10.1109/tpami.2022.3201576
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2016.350
Varma, G., Subramanian, A., Namboodiri, A., Chandraker, M., & Jawahar, C. V. (2019). IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE. https://doi.org/10.1109/wacv.2019.00190
Singh, A., Singh, K., & Sujit, P. (2021). OffRoadTranSeg: Semi-Supervised Segmentation using Transformers on OffRoad environments. arXiv. https://arxiv.org/pdf/2106.13963
Hu, X., Jiang, L., & Schiele, B. (2024). Training Vision Transformers for Semi-Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4007–4017). https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_Training_Vision_Transformers_for_Semi-Supervised_Semantic_Segmentation_CVPR_2024_paper.pdf
Huang, H., Xie, S., Lin, L., Tong, R., Chen, Y.-W., Li, Y., Wang, H., Huang, Y., & Zheng, Y. (2023). SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52729.2023.01091
Na, J., Ha, J.-W., & Chang, H. J. (2023). Switching Temporary Teachers for Semi-Supervised Semantic Segmentation. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt & S. Levine (Ed.), Advances in Neural Information Processing Systems (D. Han & W. Hwang, Corresponding author; Vol. 36, pp. 40367–40380). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2023/file/7eeb42802d3750ca59e8a0523068e9e6-Paper-Conference.pdf
Yun, S., Han, D., Chun, S., Oh, S. J., Yoo, Y., & Choe, J. (2019). CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv.2019.00612
Olsson, V., Tranheden, W., Pinto, J., & Svensson, L. (2021). ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE. https://doi.org/10.1109/wacv48630.2021.00141
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2018.00474
Cao, S., Joshi, D., Gui, L., & Wang, Y.-X. (2023). HASSOD: Hierarchical Adaptive Self-Supervised Object Detection. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt & S. Levine (Ed.), Advances in Neural Information Processing Systems (Vol. 36, pp. 59337–59359). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2023/file/b9ecf4d84999a61783c360c3782e801e-Paper-Conference.pdf
Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V. (2020). Randaugment: Practical automated data augmentation with a reduced search space. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw50498.2020.00359
Jiang, T., Chen, L., Chen, W., Meng, W., & Qi, P. (2023). ReliaMatch: Semi-Supervised Classification with Reliable Match. Applied Sciences, 13(15), 8856. https://doi.org/10.3390/app13158856
Zhu, L., Ke, Z., & Lau, R. (2023). Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning. arXiv. https://arxiv.org/pdf/2309.09774
Jin, Y., & Lin, D. (2022). Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant. In Advances in Neural Information Processing Systems (J. Wang, Corresponding author; Vol. 35, pp. 2803–2816). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/file/12d286282e1be5431ea05262a21f415c-Paper-Conference.pdf
Loshchilov, I., & Hutter, F. (2019). Decoupled Weight Decay Regularization. У International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7

Переглянути повний текст статті (PDF)

Штучний інтелект

Науковий журнал

Шукати за:

Напівкероване навчання зорового трансформера для задачі сегментації дорожнього трафіку в неструктурованому середовищі