Штучний інтелект

Науковий журнал

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Виберіть свою мову


Напівкероване навчання зорового трансформера для задачі сегментації дорожнього трафіку в неструктурованому середовищі

Шабо О.А.1, Шаповал Н.1
1 Національний технічний університет України «Київський політехнічний інститут імені Ігоря Сікорського»
andriyshabo@gmail.com; shovgun@gmail.com

Повний текст (PDF)

УДК: 004.93
Мова публікації: Англійська
Stuc. intelekt. 2024; 29; (4):133-140

Анотація: In the last few years, traditionally used for natural language processing tasks, recurrent neural networks have been replaced mainly by transformers. Thanks to the novel attention mechanism, they also sequentially receive text input but provide much better results than LSTM, GRU-based, or similar networks. Self-attention negates the problem of fading memory by allowing efficient evaluation of dependencies between distant tokens and provides a better means for parallelization for modern processing units like GPU. Until recently, the use of transformers for computer vision (CV) tasks was minimal. The biggest obstacles that hindered the progress in this field were immense computational complexity, the fact that the image is a grid, not a sequence-like text, and the lack of strong inductive bias, in other words, the ability to have a good grasp of local correlations, unlike their CNN counterparts. The latest slowed down the vision transformer (ViT) usage rate in semantic segmentation (SS) even more. However, it was recently shown that with sufficient data, Transformers could outperform CNN-based networks in image classification and, with the proper ViT structure, even in SS. A promising direction for providing a ViT with required training data is using semi-supervised learning (SSL), which allows for extracting helpful information from unlabeled data using only a small amount of labeled data. This approach is beneficial when solving the problem of SS since manually creating masks for images is very time-consuming. This paper proposes the robust semi-supervised ViT learning method using minimal labeled data. The combination of a strong augmentation pipeline and a dual teacher paradigm allows good performance for SS of road traffic in the unstructured environment without the need for extensive hyperparameter search.

Ключові слова: semi-supervised learning, vision transformer, semantic segmentation, unstructured environment

Посилання:

  1. Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in Vision: A Survey. ACM Computing Surveys. https://doi.org/10.1145/3505244
  2. Cai, Z., Ravichandran, A., Favaro, P., Wang, M., Modolo, D., Bhotika, R., Tu, Z., & Soatto, S. (2022). Semi-supervised Vision Transformers at Scale. arXiv. https://arxiv.org/pdf/2208.05688
  3. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv48922.2021.00986
  4. Cai, H., Li, J., Hu, M., Gan, C., & Han, S. (2023). EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv51070.2023.01587
  5. Csurka, G., Volpi, R., & Chidlovskii, B. (2022). Semantic Image Segmentation: Two Decades of Research. Foundations and Trends® in Computer Graphics and Vision, 14(1-2), 1–162. https://doi.org/10.1561/0600000095
  6. Chen, Y., Mancini, M., Zhu, X., & Akata, Z. (2022). Semi-Supervised and Unsupervised Deep Visual Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–23. https://doi.org/10.1109/tpami.2022.3201576
  7. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2016.350
  8. Varma, G., Subramanian, A., Namboodiri, A., Chandraker, M., & Jawahar, C. V. (2019). IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE. https://doi.org/10.1109/wacv.2019.00190
  9. Singh, A., Singh, K., & Sujit, P. (2021). OffRoadTranSeg: Semi-Supervised Segmentation using Transformers on OffRoad environments. arXiv. https://arxiv.org/pdf/2106.13963
  10. Hu, X., Jiang, L., & Schiele, B. (2024). Training Vision Transformers for Semi-Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4007–4017). https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_Training_Vision_Transformers_for_Semi-Supervised_Semantic_Segmentation_CVPR_2024_paper.pdf
  11. Huang, H., Xie, S., Lin, L., Tong, R., Chen, Y.-W., Li, Y., Wang, H., Huang, Y., & Zheng, Y. (2023). SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52729.2023.01091
  12. Na, J., Ha, J.-W., & Chang, H. J. (2023). Switching Temporary Teachers for Semi-Supervised Semantic Segmentation. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt & S. Levine (Ed.), Advances in Neural Information Processing Systems (D. Han & W. Hwang, Corresponding author; Vol. 36, pp. 40367–40380). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2023/file/7eeb42802d3750ca59e8a0523068e9e6-Paper-Conference.pdf
  13. Yun, S., Han, D., Chun, S., Oh, S. J., Yoo, Y., & Choe, J. (2019). CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv.2019.00612
  14. Olsson, V., Tranheden, W., Pinto, J., & Svensson, L. (2021). ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE. https://doi.org/10.1109/wacv48630.2021.00141
  15. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2018.00474
  16. Cao, S., Joshi, D., Gui, L., & Wang, Y.-X. (2023). HASSOD: Hierarchical Adaptive Self-Supervised Object Detection. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt & S. Levine (Ed.), Advances in Neural Information Processing Systems (Vol. 36, pp. 59337–59359). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2023/file/b9ecf4d84999a61783c360c3782e801e-Paper-Conference.pdf
  17. Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V. (2020). Randaugment: Practical automated data augmentation with a reduced search space. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw50498.2020.00359
  18. Jiang, T., Chen, L., Chen, W., Meng, W., & Qi, P. (2023). ReliaMatch: Semi-Supervised Classification with Reliable Match. Applied Sciences, 13(15), 8856. https://doi.org/10.3390/app13158856
  19. Zhu, L., Ke, Z., & Lau, R. (2023). Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning. arXiv. https://arxiv.org/pdf/2309.09774
  20. Jin, Y., & Lin, D. (2022). Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant. In Advances in Neural Information Processing Systems (J. Wang, Corresponding author; Vol. 35, pp. 2803–2816). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/file/12d286282e1be5431ea05262a21f415c-Paper-Conference.pdf
  21. Loshchilov, I., & Hutter, F. (2019). Decoupled Weight Decay Regularization. У International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7

Переглянути повний текст статті (PDF)