Artificial intelligence

Scientific journal

ISSN 2710-1673

ONLINE: ISSN 2710-1681

Select your language


About One Machine Learning Method For Paraphrase Identification

Marchenko O.1, Nykonenko A.2, Rossada T.2, Melnikov E.2
1 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”
2 Taras Shevchenko National University of Kyiv

Full text (PDF)

UDC: 68Т50
Publication Language: Ukrainian
Stuc. intelekt. 2016; 21(3):128-136

Abstract: A new effective algorithm for paraphrase identification has been developed with using machine learning approach. Architecture of the system has a form of multilayer classifier where sub-classifiers of the lower level make decisions about presence or absence of paraphrase in sentences according to their strategies and super-classifier of upper level finds the final solution. Experiments demonstrated precision of paraphrase detection comparable with the best ones state-of-the-art systems.

Keywords: machine learning, natural language text processing, paraphrase identification

References:

  1. Dolan B., Quirk C., Brockett C. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedingsofthe20thInternationalConferenceon Computational Linguistics, 2004.
  2. Potthast M., Stein B., Barron-Cedeno A., Rosso P. An Evaluation Framework for Plagiarism Detection. In Proceedings of COLING, pp. 997–1005, 2010.
  3. Wan S., Dras M., Dale R., Paris C. Using Dependency-based Features to Take the ”Para-farce” out of Paraphrase. In Australasian Language Technology Workshop, pp. 131–138, 2006.
  4. Madnani N., Tetreault J., Chodorow M. Re-examining machine translation metrics for paraphrase identification. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 182–190, 2012.
  5. Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press, 1998.
  6. Das D., Smith N.A. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational inguisticsand the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 468–476, 2009.
  7. Hassan S. Measuring Semantic Relatedness Using Salient Encyclopedic Concepts. Ph.D. thesis, University of North Texas, Denton, Texas, USA, 2011.
  8. Guo W., Diab M. Modeling sentences in the latent space. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 864–872, 2012.
  9. He, Hua, Gimpel K., Lin J. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, Proceedings of EMNLP 2015, Lisbon, Portugal, pp. 1576-1586.
  10. Cheng J., Kartsaklis D. Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning, Proceedings of EMNLP 2015, Lisbon, Portugal, pp. 1531-1542.
  11. Ji Y., Eisenstein J. Discriminative Improvements to Distributional Sentence Similarity, Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2013), Seattle, Washington, USA, pp. 891—896.
  12. Madnani N., Tetreault J., Chodorow M. Re-examining Machine Translation Metrics for Paraphrase Identification, Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012), pp. 182-190.
  13. Papineni K., Roukos S., Ward T., Zhu W.J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of ACL, 2002.
  14. Doddington G. Automatic Evaluation of Machine Translation Quality using N-gram Co-occurrence Statistics. In Proceedings of HLT, pp. 138–145, 2002.
  15. Denkowski M., Lavie M. Extending the METEOR Machine Translation Metric to the Phrase Level. In Proceedings of NAACL, 2010.
  16. Parker S. BADGER: A New Machine Translation Metric. In Proceedings of the Workshop on Metrics for Machine Translation at AMTA, 2008.
  17. Nykonenko A.O. Doslidzhennya statystychnoyi skhozhosti-zv"yaznosti // Visnyk KNU imeni Tarasa Shevchenka, seriya fizyko-matematychni nauky. — 2016. — # 1 — C. 131—136.
  18. [Elektronnyy resurs]. – Rezhym dostupu: http://scikit-learn.org/stable/modules/feature_selection.html
  19. [Elektronnyy resurs]. – Rezhym dostupu: https://www.microsoft.com/en-us/download/details.aspx?id=52398
  20. [Elektronnyy resurs]. –Rezhym dostupu: https://www.aclweb.org/aclwiki/index.php?title=Paraphrase_Identification_(State_of_the_art))

View full text (PDF)