Штучний інтелект

Науковий журнал

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Виберіть свою мову


Pекомендація хештегів методами глибокого навчання на основі мультимодальних даних

Яковлєв С.О.1, Шаповал Н.1
1 Національний технічний університет України «Київський політехнічний інститут імені Ігоря Сікорського»
se2001ya@gmail.com; shovgun@gmail.com

Повний текст (PDF)

УДК: 004.93
Мова публікації: Англійська
Stuc. intelekt. 2024; 29; (4):41-48

Анотація: Generating image text captions is an important task and aims to automatically generate a text description for an image. Recommendation of hashtags is a practical option for this task. Hashtags contribute to increasing the relevance of content for the audience and ensure better visibility of publications. The problem of choosing optimal hashtags becomes especially relevant for social platforms, where users generate huge amounts of content with different types of modalities — images, text captions, videos, etc. There are a number of challenges that need to be addressed when solving this problem. First, text captions for posts are often short or even absent. Secondly, multimodal algorithms often do not take into account the previous activity of the user, which can significantly limit the quality of recommendations. Third, the balance between the importance of textual and visual cues may vary depending on the nature of the publication. The purpose of this study is to develop a modified feature fusion algorithm for the task of multimodal hashtag recommendation, which is able to take into account the context of the user's previous history, adaptively evaluate the importance of textual and visual features, and improve the quality of recommendations in cases of the absence or weakness of textual description. As part of the study, a model was modified that, in addition to analyzing the image and text caption, takes into account part of the previous history of user interactions. The main contribution is a new feature fusion module that weights their importance depending on the context. This approach allows to improve the relevance of recommendations in situations where the textual modality is not informative enough, which is a common problem in real data. The experimental results confirmed that the proposed feature fusion module provided more accurate hashtag recommendations, especially for cases with short or missing text captions.

Ключові слова: neural networks, deep learning, multimodal data, hashtags, social networks

Посилання:

  1. Q. Yang, G. Wu, Y. Li, R. Li, X. Gu, H. Deng, and J. Wu, “AMNN Attention-based multimodal neural network model for hashtag recommendation,” IEEE Transactions on Computational Social Systems, vol. 7, no. 3, pp. 768–779, 2020. DOI: https://dx.doi.org/10.1109/TCSS.2020.2986778
  2. Q. Zhang, J. Wang, H. Huang, X. Huang, and Y. Gong, “Hashtag recommendation for multimodal microblog using co-attention network.” in IJCAI, 2017, pp. 3420–3426. DOI: http://dx.doi.org/10.24963/ijcai.2017/478
  3. S. Zhang, Y. Yao, F. Xu, H. Tong, X. Yan, and J. Lu, “Hashtag recommendation for photo sharing services,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 5805–5812. DOI: http://dx.doi.org/10.1609/aaai.v33i01.33015805
  4. Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical Question-Image Co-Attention for Visual Question Answering. 2016. arXiv:1606.00061. DOI: http://dx.doi.org/10.48550/arXiv.1606.00061
  5. Y. -C. Chen, K. -T. Lai, D. Liu, and M. -S. Chen, “Tagnet: Triplet-attention graph networks for hashtag recommendation,” IEEE Transactions on Circuits and Systems for Video Technology, 2021. DOI: https://dx.doi.org/10.1109/TCSVT.2021.3074599
  6. Bansal, Shubhi & Gowda, Kushaan & Kumar, Nagendra. (2022). A Hybrid Deep Neural Network for Multimodal Personalized Hashtag Recommendation. IEEE Transactions on Computational Social Systems. DOI: http://dx.doi.org/10.1109/TCSS.2022.3184307.
  7. Ashish Vaswani. et al. Attention Is All You Need. 2017. arXiv:1706.03762. DOI: https://doi.org/10.48550/arXiv.1706.03762.
  8. Karen Simonian, Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. 2014. DOI: https://doi.org/10.48550/arXiv.1409.1556.
  9. Kaiming He, Xianghu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. 2015. DOI: https://doi.org/10.48550/arXiv.1512.03385
  10. Alexey Dosovitskiy et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020. arXiv:2010.11929v2. DOI: https://doi.org/10.48550/arXiv.2010.11929

Переглянути повний текст статті (PDF)