Artificial intelligence

Scientific journal

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Select your language


Analysis of Semantic Distance Calculation Methods for Assessing the Effectiveness of Natural Language Chatbots

Klymenko M.1, Shash M.2
1 Institute of artificial intelligence problems of MES and NAS of Ukraine
2 State University of Information and Communication Technologies
max.shash@gmail.com

Full text (PDF)

UDC: 004.934.2
Publication Language: Ukrainian
Stuc. intelekt. 2025; 30(4):78-87

Abstract: The article presents a comprehensive analysis of modern methods for computing semantic distance between textual units in order to evaluate the effectiveness of natural language chatbots. The evolution of approaches to measuring semantic similarity is examined — from classical lexico-statistical methods and static vector embeddings to contextualized deep learning models, in particular BERT and its derivatives. The empirical study was conducted on a dialog corpus of a virtual psychological assistant designed to provide psychological support. The effectiveness of the methods was assessed using quantitative metrics for intent classification and response selection, as well as qualitative expert evaluation of response adequacy. The obtained results demonstrate a significant advantage of contextualized models, particularly SimSCE-BERT, over traditional approaches such as word2vec and the base BERT. It is shown that the use of modern methods for computing semantic distance contributes to improving both the technical performance of chatbots and the user-perceived quality of interaction, which is critically important for scalable systems in the field of psychological assistance and other applied domains.

Keywords: semantic similarity, natural language processing, chatbots, vector embeddings, BERT, psychological assistant.

References:

  1. Wang, Y., Xue, T., & Yang, X. (2025). Exploring the relationship between features calculated from contextual embeddings and EEG band power during sentence reading in Chinese. Frontiers in Neuroscience, 19. https://doi.org/10.3389/fnins.2025.1656519
  2. Peng Ding, P. D., Peng Ding, D. L., Dan Liu, Z. Z., Zhiyuan Zhang, J. H., & Jie Hu, N. L. (2022). A Novel Discrimination Structure for Assessing Text Semantic Similarity. Internet Technology Journal, 23(4), 709–717. https://doi.org/10.53106/160792642022072304006
  3. Wang, J., & Dong, Y. (2020). Measurement of Text Similarity: A Survey. Information, 11(9), 421. https://doi.org/10.3390/info11090421
  4. Dhagat, R., Rawal, A., & Soni, S. (2022). Comparative Evaluation of Semantic Similarity Upon Sentential Text of Varied (Generic) Lengths. In Lecture Notes in Electrical Engineering (pp. 107–122). Springer Nature Singapore. https://doi.org/10.1007/978-981-19-0284-0_9
  5. Zhou, S., Xu, X., Liu, Y., Chang, R., & Xiao, Y. (2019). Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis. IEEE Access, 7, 107247–107258. https://doi.org/10.1109/access.2019.2932334
  6. Shevchenko, A. I., Panok, V. G., Shevtsov, A. G., Slyusar, V. I., Maly, R. I., Eroshenko, T. V., & Nazar, M. M. (2024). Development of a virtual psychological assistant with artificial intelligence in the field of health care. Clinical and Preventive Medicine, (8), 15-27. https://doi.org/10.31612/2616-4868.8.2024.02
  7. Sharma, K. (2023). 30 Years of Research on Semantic Similarity Measurement. Center for Open Science. https://doi.org/10.31219/osf.io/qpb6d
  8. Wang, Y. (2022). A Survey on Efficient Processing of Similarity Queries over Neural Embeddings. arXiv. https://doi.org/10.48550/ARXIV.2204.07922
  9. Zhou, S., Xu, X., Liu, Y., Chang, R., & Xiao, Y. (2019). Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis. IEEE Access, 7, 107247–107258. https://doi.org/10.1109/access.2019.2932334
  10. Colla, D., Mensa, E., & Radicioni, D. P. (2020). Novel metrics for computing semantic similarity with sense embeddings. Knowledge-Based Systems, 206, 106346. https://doi.org/10.1016/j.knosys.2020.106346
  11. Pan, J.-S., Wang, X., Yang, D., Li, N., Huang, K., & Chu, S.-C. (2024). Flexible margins and multiple samples learning to enhance lexical semantic similarity. Engineering Applications of Artificial Intelligence, 133, 108275. https://doi.org/10.1016/j.engappai.2024.108275
  12. der Brück, T. vor, & Pouly, M. (2024). Estimating Text Similarity based on Semantic Concept Embeddings. arXiv. https://doi.org/10.48550/ARXIV.2401.04422
  13. Chang, H.-S., Agrawal, A., & McCallum, A. (2021). Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications. arXiv. https://doi.org/10.48550/ARXIV.2103.15330
  14. Wei, C., Wang, B., & Jay Kuo, C.-C. (2023). Synwmd: Syntax-aware word Mover’s distance for sentence similarity evaluation. Pattern Recognition Letters, 170, 48–55. https://doi.org/10.1016/j.patrec.2023.04.012
  15. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Proceedings of the 2019 Conference of the North, 4171–4186. https://doi.org/10.18653/v1/n19-1423
  16. Xiao, Z., Ning, X., & Duritan, M. J. M. (2025). BERT-SVM: A hybrid BERT and SVM method for semantic similarity matching evaluation of paired short texts in English teaching. Alexandria Engineering Journal, 126, 231–246. https://doi.org/10.1016/j.aej.2025.04.061
  17. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv. DOI: 10.48550/ARXIV.1908.10084
  18. Xu, Y., Tian, J., Tang, M., Tao, L., & Wang, L. (2024). Document-level relation extraction with entity mentions deep attention. Computer Speech & Language, 84, 101574. https://doi.org/10.1016/j.csl.2023.101574
  19. Liu, N., Hu, J., & Liang, W. (2023). MIFINN: A novel multi-information fusion and interaction neural network for aspect-based sentiment analysis. Knowledge-Based Systems, 280, 110983. https://doi.org/10.1016/j.knosys.2023.110983
  20. Wang, T., Shi, H., Liu, W., & Yan, X. (2022). A joint FrameNet and element focusing Sentence-BERT method of sentence similarity computation. Expert Systems with Applications, 200, 117084. https://doi.org/10.1016/j.eswa.2022.117084
  21. Herbold, S. (2023). Semantic similarity prediction is better than other semantic similarity measures. arXiv. https://doi.org/10.48550/ARXIV.2309.12697
  22. Wei, C., Wang, B., & Kuo, C.-C. J. (2022). Synwmd: Syntax-Aware Word Mover’s Distance for Sentence Similarity Evaluation. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4145635
  23. Shen, Z., & Xiao, Z. (2024). A Chinese Short Text Similarity Method Integrating Sentence-Level and Phrase-Level Semantics. Electronics, 13(24), 4868. https://doi.org/10.3390/electronics13244868
  24. Lee, S., Lee, D., Jang, S., & Yu, H. (2022). Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2202.13196
  25. Li, R., Cheng, L., Wang, D., & Tan, J. (2023). Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity. Multimedia Tools and Applications, 82(30), 46673–46694. https://doi.org/10.1007/s11042-023-15509-4
  26. Pang, S., Yao, J., Liu, T., Zhao, H., & Chen, H. (2020). A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases. Chinese Journal of Electronics, 29(2), 233–241. https://doi.org/10.1049/cje.2019.12.011
  27. Chen, Q., Wang, W., Zhang, Q., Zheng, S., Deng, C., Yu, H., Liu, J., Ma, Y., & Zhang, C. (2023). Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings. arXiv. https://doi.org/10.48550/ARXIV.2305.10786
  28. Zhou, K., Ethayarajh, K., & Jurafsky, D. (2021). Frequency-based Distortions in Contextualized Word Embeddings (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2104.08465
  29. Wang, Z., Dou, J., & Zhang, Y. (2022). Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics. arXiv. https://doi.org/10.48550/ARXIV.2210.02284
  30. Opitz, J., & Frank, A. (2022). SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2206.07023
  31. Soricut, R., & Ding, N. (2016). Multilingual Word Embeddings using Multigraphs. arXiv. https://doi.org/10.48550/ARXIV.1612.04732

View full text (PDF)