Search by:
Year of publication
Author name
Paper title
About One Machine Learning Method For Paraphrase Identification
Full text (PDF)
UDC: 68Т50
Publication Language: Ukrainian
Stuc. intelekt. 2016; 21(3):128-136
Abstract: A new effective algorithm for paraphrase identification has been developed with using machine learning approach. Architecture of the system has a form of multilayer classifier where sub-classifiers of the lower level make decisions about presence or absence of paraphrase in sentences according to their strategies and super-classifier of upper level finds the final solution. Experiments demonstrated precision of paraphrase detection comparable with the best ones state-of-the-art systems.
Keywords: machine learning, natural language text processing, paraphrase identification
References:
- Dolan B., Quirk C., Brockett C. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedingsofthe20thInternationalConferenceon Computational Linguistics, 2004.
- Potthast M., Stein B., Barron-Cedeno A., Rosso P. An Evaluation Framework for Plagiarism Detection. In Proceedings of COLING, pp. 997–1005, 2010.
- Wan S., Dras M., Dale R., Paris C. Using Dependency-based Features to Take the ”Para-farce” out of Paraphrase. In Australasian Language Technology Workshop, pp. 131–138, 2006.
- Madnani N., Tetreault J., Chodorow M. Re-examining machine translation metrics for paraphrase identification. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 182–190, 2012.
- Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press, 1998.
- Das D., Smith N.A. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational inguisticsand the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 468–476, 2009.
- Hassan S. Measuring Semantic Relatedness Using Salient Encyclopedic Concepts. Ph.D. thesis, University of North Texas, Denton, Texas, USA, 2011.
- Guo W., Diab M. Modeling sentences in the latent space. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 864–872, 2012.
- He, Hua, Gimpel K., Lin J. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks, Proceedings of EMNLP 2015, Lisbon, Portugal, pp. 1576-1586.
- Cheng J., Kartsaklis D. Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning, Proceedings of EMNLP 2015, Lisbon, Portugal, pp. 1531-1542.
- Ji Y., Eisenstein J. Discriminative Improvements to Distributional Sentence Similarity, Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2013), Seattle, Washington, USA, pp. 891—896.
- Madnani N., Tetreault J., Chodorow M. Re-examining Machine Translation Metrics for Paraphrase Identification, Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012), pp. 182-190.
- Papineni K., Roukos S., Ward T., Zhu W.J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of ACL, 2002.
- Doddington G. Automatic Evaluation of Machine Translation Quality using N-gram Co-occurrence Statistics. In Proceedings of HLT, pp. 138–145, 2002.
- Denkowski M., Lavie M. Extending the METEOR Machine Translation Metric to the Phrase Level. In Proceedings of NAACL, 2010.
- Parker S. BADGER: A New Machine Translation Metric. In Proceedings of the Workshop on Metrics for Machine Translation at AMTA, 2008.
- Nykonenko A.O. Doslidzhennya statystychnoyi skhozhosti-zv"yaznosti // Visnyk KNU imeni Tarasa Shevchenka, seriya fizyko-matematychni nauky. — 2016. — # 1 — C. 131—136.
- [Elektronnyy resurs]. – Rezhym dostupu: http://scikit-learn.org/stable/modules/feature_selection.html
- [Elektronnyy resurs]. – Rezhym dostupu: https://www.microsoft.com/en-us/download/details.aspx?id=52398
- [Elektronnyy resurs]. –Rezhym dostupu: https://www.aclweb.org/aclwiki/index.php?title=Paraphrase_Identification_(State_of_the_art))