Шукати за:
Розпізнавання іменованих сутностей для української мови без прикладів та з малою кількістю прикладів на основі модифікованої архітектури GLINER
Повний текст (PDF)
УДК: 004.93
Мова публікації: Англійська
Stuc. intelekt. 2025; 30; (4):69-77
Анотація: This paper presents a method for named entity recognition (NER) for the Ukrainian language with limited training data, based on a modified GLiNER architecture. The proposed method combines the advantages of endto-end span-based NER with architectural and training improvements added to handle lowresource issues. We replace the original DeBERTa encoder with a compact Snowflake ArcticEmbed 2.0 encoder pretrained for retrieval, introduce a postfusion crossattention block between text and entity type descriptions, and use a lightweight span scoring module with GoLU activation. A new Ukrainian multidomain NER corpus and out of domain benchmark were created to evaluate the model. Experimental results show that the proposed method achieves competitive F1 performance compared to openweight LLMs while being far cheaper to deploy, approaching proprietary LLM quality in few-shot settings.
Ключові слова: low-resource NLP tasks, named entity recognition, zero-shot learning, few-shot learning, retrieval
Посилання:
- Gubarev, V., Kuratov, Y., Dale, D., et al. (2024). GLiNER: Generalist and Lightweight Model for Named Entity Recognition. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 5364–5376. doi:10.18653/v1/2024.naacllong.300.
- Snowflake Inc. (2024). Snowflake ArcticEmbedL v2.0: Multilingual text embedding model. Model card on Hugging Face Hub. Available at: https://huggingface.co/Snowflake/snowflakearcticembedlv2.0.
- Enevoldsen, K., Zhang, Y., Stosic, A., et al. (2025). Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics. arXiv preprint, arXiv:2502.03654. doi:10.48550/arXiv.2502.03654.
- Ding, N., Xu, G., Chen, Y., Wang, X., Han, X., Xie, P., Zheng, H.T., Liu, Z. (2021). FewNERD: A Fewshot Named Entity Recognition Dataset. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 3198–3213. doi:10.18653/v1/2021.acllong.248.
- Snell, J., Swersky, K., Zemel, R. (2017). Prototypical Networks for FewShot Learning. Advances in Neural Information Processing Systems (NeurIPS), 30. doi:10.48550/arXiv.1703.05175.
- Das, S. S., Katiyar, A., Passonneau, R. J., Zhang, R. (2022). CONTaiNER: Fewshot Named Entity Recognition via Contrastive Learning. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 6338–6353. doi:10.18653/v1/2022.acllong.439.
- Li, Y., Yu, Y., Qian, T. (2023). TypeAware Decomposed Framework for FewShot Named Entity Recognition. Findings of the Association for Computational Linguistics: EMNLP 2023. doi:10.18653/v1/2023.findingsemnlp.598.
- Xiao, S., Liu, Z., Shao, Y., Cao, Z. (2022). RetroMAE: PreTraining Retrievaloriented Language Models via Masked AutoEncoder. Proceedings of EMNLP 2022. doi:10.48550/arXiv.2205.12035.
- Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., ... & Farhadi, A. (2022). Matryoshka representation learning. Advances in Neural Information Processing Systems, 35, 30233-30249. doi:10.48550/arXiv.2205.13147.
- Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023, May). Mteb: Massive text embedding benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 2014-2037). doi:10.48550/arXiv.2210.07316.
- Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988). doi:10.1109/ICCV.2017.324.
- Vitaliias. (2024). hromadske_corruption [dataset]. Hugging Face Datasets. url:https://huggingface.co/datasets/Vitaliias/hromadske_corruption, 2025. Accessed: 2025-10-26.
- agentlans. (2023). Highquality multilingual sentences (subset: uk) [dataset]. Hugging Face Datasets. url:https://huggingface.co/datasets/agentlans/highqualitymultilingualsentences/viewer/uk, 2025. Accessed: 2025-10-26.
- OpenAI. (2024). GPT4o models. url: https://platform.openai.com, 2025. Accessed: 2025-10-26.
- KSE-RESEARCH-Group. (2023). UAReviews [dataset]. Hugging Face. url:https://huggingface.co/datasets/KSERESEARCHGroup/UAReviews, 2025. Accessed: 2025-10-26.
- Fumika. (2023). Wiki news multilingual [dataset]. Hugging Face. url:https://huggingface.co/datasets/Fumika/Wikinewsmultilingual, 2025. Accessed: 2025-10-26.
- Yehor. (2023). UALtopics [dataset]. Hugging Face. url:https://huggingface.co/datasets/Yehor/ualtopics, 2025. Accessed: 2025-10-26.
- Shynkarov, Y. (2023). COSMUS [dataset]. Hugging Face. url:https://huggingface.co/datasets/YShynkarov/COSMUS, 2025. Accessed: 2025-10-26.
- agentlans. (2023). LinguaNova [dataset]. Hugging Face. url:https://huggingface.co/datasets/agentlans/LinguaNov, 2025. Accessed: 2025-10-26.
- Modal Labs. (2025). Highperformance serverless GPU platform. url: https://modal.com/, 2025. Accessed: 2025-10-26.
- Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2022). Lora: Low-rank adaptation of large language models. ICLR, 1(2), 3. doi:10.48550/arXiv.2106.09685.
- Shazeer, N., & Stern, M. (2018, July). Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning (pp. 4596-4604). PMLR. doi:10.48550/arXiv.1804.04235.
- Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. doi:10.48550/arXiv.1711.05101.
- Liu, H., Li, Z., Hall, D., Liang, P., & Ma, T. (2023). Sophia: A scalable stochastic second-order optimizer for language model pre-training. arXiv preprint arXiv:2305.14342. doi:10.48550/arXiv.2305.14342.
- Urchade. (2024). glinermultiv2.1 [model]. Hugging Face. url: https://huggingface.co/urchade/gliner_multiv2.1, 2025. Accessed: 2025-10-26.
- Qwen Team. (2024). Qwen38B [model]. Hugging Face. url:https://huggingface.co/Qwen/Qwen38B, 2025. Accessed: 2025-10-26.
- Meta AI. (2024). Llama 3.18B [model]. Hugging Face. url:https://huggingface.co/metallama/Llama3.18B, 2025. Accessed: 2025-10-26.
- Google DeepMind. (2024). Gemma 312BIT [model]. Hugging Face. url:https://huggingface.co/google/gemma312bi, 2025. Accessed: 2025-10-26.
- OpenAI. (2024). GPTOSS20B [model]. Hugging Face. url: https://huggingface.co/openai/gptoss20b, 2025. Accessed: 2025-10-26.
- NovitaAI. (2025). Model serving platform. url: https://novita.ai, 2025. Accessed: 2025-10-26.
- INSAIT Institute. (2024). MamayLM Gemma312BIT v1.0 [model]. Hugging Face. url: https://huggingface.co/INSAITInstitute/MamayLMGemma312BITv1.0, 2025. Accessed: 2025-10-26.
- Lapa Lab. (2024). LappaLLM [model]. Hugging Face. url:https://huggingface.co/lapallm/lapav0.1.2instruct, 2025. Accessed: 2025-10-26.
- vLLM Team. (2024). vLLM: Easy, Fast, and Cheap LLM Serving. url: https://docs.vllm.ai/, 2025. Accessed: 2025-10-26.
- OpenAI. (2025). GPT5mini models. url: https://platform.openai.com/docs/models/gpt5mini, 2025. Accessed: 2025-10-26.
- Anthropic. (2025). Claude Haiku 4.5 models. url: https://www.anthropic.com/claude, 2025. Accessed: 2025-10-26.