Штучний інтелект

Науковий журнал

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Виберіть свою мову


Розпізнавання іменованих сутностей для української мови без прикладів та з малою кількістю прикладів на основі модифікованої архітектури GLINER

Кашперова С.В.1, Шаповал Н.1
1 Національний технічний університет «Київський політехнічний інститут імені Ігоря Сікорського»
kashperova.study@gmail.com; shovgun@gmail.com

Повний текст (PDF)

УДК: 004.93
Мова публікації: Англійська
Stuc. intelekt. 2025; 30; (4):69-77

Анотація: This paper presents a method for named entity recognition (NER) for the Ukrainian language with limited training data, based on a modified GLiNER architecture. The proposed method combines the advantages of end­to-end span-based NER with architectural and training improvements added to handle low­resource issues. We replace the original DeBERTa encoder with a compact Snowflake Arctic­Embed 2.0 encoder pretrained for retrieval, introduce a post­fusion cross­attention block between text and entity type descriptions, and use a lightweight span scoring module with GoLU activation. A new Ukrainian multi­domain NER corpus and out of domain benchmark were created to evaluate the model. Experimental results show that the proposed method achieves competitive F1 performance compared to open­weight LLMs while being far cheaper to deploy, approaching proprietary LLM quality in few­-shot settings.

Ключові слова: low­-resource NLP tasks, named entity recognition, zero-shot learning, few­-shot learning, retrieval

Посилання:

  1. Gubarev, V., Kuratov, Y., Dale, D., et al. (2024). GLiNER: Generalist and Lightweight Model for Named Entity Recognition. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 5364–5376. doi:10.18653/v1/2024.naacllong.300.
  2. Snowflake Inc. (2024). Snowflake ArcticEmbedL v2.0: Multilingual text embedding model. Model card on Hugging Face Hub. Available at: https://huggingface.co/Snowflake/snowflakearcticembedlv2.0.
  3. Enevoldsen, K., Zhang, Y., Stosic, A., et al. (2025). Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics. arXiv preprint, arXiv:2502.03654. doi:10.48550/arXiv.2502.03654.
  4. Ding, N., Xu, G., Chen, Y., Wang, X., Han, X., Xie, P., Zheng, H.T., Liu, Z. (2021). FewNERD: A Fewshot Named Entity Recognition Dataset. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 3198–3213. doi:10.18653/v1/2021.acllong.248.
  5. Snell, J., Swersky, K., Zemel, R. (2017). Prototypical Networks for FewShot Learning. Advances in Neural Information Processing Systems (NeurIPS), 30. doi:10.48550/arXiv.1703.05175.
  6. Das, S. S., Katiyar, A., Passonneau, R. J., Zhang, R. (2022). CONTaiNER: Fewshot Named Entity Recognition via Contrastive Learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 6338–6353. doi:10.18653/v1/2022.acllong.439.
  7. Li, Y., Yu, Y., Qian, T. (2023). TypeAware Decomposed Framework for FewShot Named Entity Recognition. Findings of the Association for Computational Linguistics: EMNLP 2023. doi:10.18653/v1/2023.findingsemnlp.598.
  8. Xiao, S., Liu, Z., Shao, Y., Cao, Z. (2022). RetroMAE: PreTraining Retrievaloriented Language Models via Masked AutoEncoder. Proceedings of EMNLP 2022. doi:10.48550/arXiv.2205.12035.
  9. Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., ... & Farhadi, A. (2022). Matryoshka representation learning. Advances in Neural Information Processing Systems, 35, 30233-30249. doi:10.48550/arXiv.2205.13147.
  10. Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023, May). Mteb: Massive text embedding benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 2014-2037). doi:10.48550/arXiv.2210.07316.
  11. Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988). doi:10.1109/ICCV.2017.324.
  12. Vitaliias. (2024). hromadske_corruption [dataset]. Hugging Face Datasets. url:https://huggingface.co/datasets/Vitaliias/hromadske_corruption, 2025. Accessed: 2025-10-26.
  13. agentlans. (2023). Highquality multilingual sentences (subset: uk) [dataset]. Hugging Face Datasets. url:https://huggingface.co/datasets/agentlans/highqualitymultilingualsentences/viewer/uk, 2025. Accessed: 2025-10-26.
  14. OpenAI. (2024). GPT4o models. url: https://platform.openai.com, 2025. Accessed: 2025-10-26.
  15. KSE-RESEARCH-Group. (2023). UAReviews [dataset]. Hugging Face. url:https://huggingface.co/datasets/KSERESEARCHGroup/UAReviews, 2025. Accessed: 2025-10-26.
  16. Fumika. (2023). Wiki news multilingual [dataset]. Hugging Face. url:https://huggingface.co/datasets/Fumika/Wikinewsmultilingual, 2025. Accessed: 2025-10-26.
  17. Yehor. (2023). UALtopics [dataset]. Hugging Face. url:https://huggingface.co/datasets/Yehor/ualtopics, 2025. Accessed: 2025-10-26.
  18. Shynkarov, Y. (2023). COSMUS [dataset]. Hugging Face. url:https://huggingface.co/datasets/YShynkarov/COSMUS, 2025. Accessed: 2025-10-26.
  19. agentlans. (2023). LinguaNova [dataset]. Hugging Face. url:https://huggingface.co/datasets/agentlans/LinguaNov, 2025. Accessed: 2025-10-26.
  20. Modal Labs. (2025). Highperformance serverless GPU platform. url: https://modal.com/, 2025. Accessed: 2025-10-26.
  21. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2022). Lora: Low-rank adaptation of large language models. ICLR, 1(2), 3. doi:10.48550/arXiv.2106.09685.
  22. Shazeer, N., & Stern, M. (2018, July). Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning (pp. 4596-4604). PMLR. doi:10.48550/arXiv.1804.04235.
  23. Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. doi:10.48550/arXiv.1711.05101.
  24. Liu, H., Li, Z., Hall, D., Liang, P., & Ma, T. (2023). Sophia: A scalable stochastic second-order optimizer for language model pre-training. arXiv preprint arXiv:2305.14342. doi:10.48550/arXiv.2305.14342.
  25. Urchade. (2024). glinermultiv2.1 [model]. Hugging Face. url: https://huggingface.co/urchade/gliner_multiv2.1, 2025. Accessed: 2025-10-26.
  26. Qwen Team. (2024). Qwen38B [model]. Hugging Face. url:https://huggingface.co/Qwen/Qwen38B, 2025. Accessed: 2025-10-26.
  27. Meta AI. (2024). Llama 3.18B [model]. Hugging Face. url:https://huggingface.co/metallama/Llama3.18B, 2025. Accessed: 2025-10-26.
  28. Google DeepMind. (2024). Gemma 312BIT [model]. Hugging Face. url:https://huggingface.co/google/gemma312bi, 2025. Accessed: 2025-10-26.
  29. OpenAI. (2024). GPTOSS20B [model]. Hugging Face. url: https://huggingface.co/openai/gptoss20b, 2025. Accessed: 2025-10-26.
  30. NovitaAI. (2025). Model serving platform. url: https://novita.ai, 2025. Accessed: 2025-10-26.
  31. INSAIT Institute. (2024). MamayLM Gemma312BIT v1.0 [model]. Hugging Face. url: https://huggingface.co/INSAITInstitute/MamayLMGemma312BITv1.0, 2025. Accessed: 2025-10-26.
  32. Lapa Lab. (2024). LappaLLM [model]. Hugging Face. url:https://huggingface.co/lapallm/lapav0.1.2instruct, 2025. Accessed: 2025-10-26.
  33. vLLM Team. (2024). vLLM: Easy, Fast, and Cheap LLM Serving. url: https://docs.vllm.ai/, 2025. Accessed: 2025-10-26.
  34. OpenAI. (2025). GPT5mini models. url: https://platform.openai.com/docs/models/gpt5mini, 2025. Accessed: 2025-10-26.
  35. Anthropic. (2025). Claude Haiku 4.5 models. url: https://www.anthropic.com/claude, 2025. Accessed: 2025-10-26.

Переглянути повний текст статті (PDF)