Search by:
Year of publication
Author name
Paper title
Method of constructing a text template for extracting information from semistructured data
Full text (PDF)
UDC: 004.9:371.261
Publication Language: Ukrainian
Stuc. intelekt. 2017; 22(2):60-69
Abstract: 80% of world data is unstructured or semistructured. In this regard, the main task is the problem of extraction of information and its further preservation in a form suitable for processing. For the convenience of data extraction, we suggest using text templates based on the dictionary of keywords. The main goal is to develop a method for selecting component elements for constructing a text template, as well as developing a method for clustering a text template. The analysis of the developed methods on the example of work of the library system is carried out.
Keywords: semistructured data, data extruction, text templates, methods of clasterisation.
References:
- Shakhovska, N.B., Noha, R.Y. 2015 . Methods and Tools for Text Analysis of Publications to Study theFunctioning of Scientific Schools. Journal of Automation and Information Sciences, p. 47.
- Zaxarchuk T.V. Nauchnye shkoli v by`bly`ografovedeny`y`: osobennosty` formy`rovany`ya /T.V. Zaxarchuk // Nauchno-texny`ches-kaya y`nformacy`ya. Ser. 1. Organy`zacy`ya y` metody`kay`nformacy`on-noj raboty.– 2011. – # 1. – S. 19–25.
- Сhappin E.J.L. Transition and transformation: A bibliometric analysis of two scientific networksresearching socio-technical change / Emile J.L. Chappin, Andreas Ligtvoet // Renewable and SustainableEnergy Reviews. –2014. – Vol. 30.–P. 715–723.
- Lande D.V. Naukometry`chni doslidzhennya merezh spivavtorstva po bazi dany`x «Ukrayinika naukova» /D.V. Lande, I.V. Balagura // Reyestraciya, zberigannya i obrobka dany`x. – 2012, – T.14, No4 –S.41-51.
- Berry M., Kogan J. Text Mining. Applications and Theory. West Sussex: Wiley, 2010. - 222 p.
- Park S.-T. Analysis of Lexical Signatures for Finding Lost or Related Documents / S.-T. Park,D. Pennock, C. Lee Giles, R. Krovetz. ˗ Finland, 2002. ˗ 8p.