Detection of near dublicates in tables based on the locality-sensitive hashing method and the nearest neighbor method

Lizunov, Petro; Biloshchytskyi, Andrii; Kuchansky, Alexander; Biloshchytska, Svitlana; Chala, Larysa

doi:10.15587/1729-4061.2016.86243

Cited by 21 publications

(17 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This technique is based on the assumption that the abstracts of scientific publications that relate to the same scientific direction will contain the same concepts and keywords, that is, they will be quite similar in content. The method for determining closeness between fragments of text information, but when applied to the task on finding incomplete duplicates, was described in papers [5,6].…”

Section: A Methods For the Clustering Of Publications Of Scientists Bymentioning

confidence: 99%

“…Using a method of locally-sensitive hashing, in accordance with the method described in paper [5], we shall determine index elements:…”

Section: A Methods For the Clustering Of Publications Of Scientists Bymentioning

confidence: 99%

“…Paper [5] describes a method for determining the degree of similarity of scientific texts based on the method of locally-sensitive hashing for finding incomplete duplicates in the scientific articles. This task can be used to establish similarities between publications at the clustering stage of these publications.…”

Section: Literature Review and Problem Statementmentioning

confidence: 99%

“…This task can be used to establish similarities between publications at the clustering stage of these publications. Article [6] presents a conceptual model of the automated system of finding incomplete duplicates, which is used for the implementation of methods outlined in paper [5]. This conceptual model can be used to implement a method of clustering of scientific publications, which is based on an analysis of similarities in the content of these publications.…”

Section: Literature Review and Problem Statementmentioning

confidence: 99%

See 3 more Smart Citations

A method for the identification of scientists' research areas based on a cluster analysis of scientific publications

Biloshchytskyi

Kuchansky

Andrashko

et al. 2017

EEJET

Self Cite

View full text Add to dashboard Cite

4 quality and efficiency of scientific research. Therefore, the establishment of criteria for the evaluation of research activities, emphasis on the analysis of scientific areas tackled by researchers, are important tasks for scientific and educational institutions, companies, which are engaged in the creation of scientifically-intensive technologies, and the state in general. 4 A. Biloshchytskyi, А. Kuchansky, Yu. Andrashko, S. Biloshchytska, O. Kuzka, Ye. Shabala, T. Lyashchenko, 2017 IntroductionEconomic growth and prosperity of any country depends largely on the development of science, technology, efficiency of the productive forces of society, etc. Certainly, the scientific-technical progress is impossible without ensuring the INFORMATION TECHNOLOGY A METHOD FOR THE IDENTIFICATION OF SCIENTISTS' RESEARCH AREAS BASED ON A CLUSTER ANALYSIS OF SCIENTIFIC PUBLICATIONS A . B i l o s h c h y t s k y i Doctor of Technical Sciences, Professor Department of Network and Internet TechnologiesTaras Shevchenko National University of Kyiv Volodymyrska str., 60, Kyiv, Ukraine, 01033 E-mail: bao1978@gmail.com А . K u c h a n s k yPhD, Associate Professor* E-mail: kuczanski@gmail.com Y u . A n d r a s h k oLecturer*** E-mail: yurii.andrashko@uzhnu.edu.ua S . B i l o s h c h y t s k a T . L y a s h c h e n k oSenior Lecturer Department of Information Technologies** E-mail: liazschenko@ukr.net *Department of Cybersecurity and Computer Engineering** **Kyiv National University of Construction and Architecture Povitroflotskyi ave., 31, Kyiv, Ukraine, 03037 ***Department of System Analysis and Optimization Theory Uzhhorod National University Narodna sq., 3, Uzhhorod, Ukraine, 88000Пропонується метод кластеризації публікацій науковців за науковими напря-мами. В рамках даного методу запропо-новано два способи знаходження відстані між публікаціями. Перший спосіб вико-ристовує довжину маршруту у графі цитування між публікаціями. Другий спосіб враховує розрахунок подібності між анотаціями публікацій на основі методу локально-чутливого хешування. Також пропонується метод ідентифіка-ції напрямів досліджень науковців, який базується на результатах кластеризації наукових публікацій Ключові слова: кластеризація, напрям наукових досліджень, граф цитувань, локально-чутливе хешування Предлагается метод кластеризации публикаций ученых по научным направ-лениям. В рамках данного метода пред-ложено два способа нахождения рас-стояния между публикациями. Первый способ использует длину маршрута в графе цитирования между публикаци-ями. Второй способ учитывает расчет подобия между аннотациями публика-ций на основе метода локально-чувстви-тельного хеширования. Также предло-жен метод идентификации направлений исследований ученых, который базиру-ется на результатах кластеризации научных публикаций Ключевые слова: кластеризация, направление научных исследований, граф цитирования, локально-чувствительное хеширование UDC 005.8

show abstract

Section: A Methods For the Clustering Of Publications Of Scientists Bymentioning

confidence: 99%

“…Using a method of locally-sensitive hashing, in accordance with the method described in paper [5], we shall determine index elements:…”

Section: A Methods For the Clustering Of Publications Of Scientists Bymentioning

confidence: 99%

Section: Literature Review and Problem Statementmentioning

confidence: 99%

Section: Literature Review and Problem Statementmentioning

confidence: 99%

See 2 more Smart Citations

A method for the identification of scientists' research areas based on a cluster analysis of scientific publications

Biloshchytskyi

Kuchansky

Andrashko

et al. 2017

EEJET

Self Cite

View full text Add to dashboard Cite

show abstract

“…In particular, authors of work [18] have developed a method for detecting incomplete duplicates in tables which is based on the methods of the nearest neighbor and locally sensitive hashing. In work [19], a conceptual model of the system for finding incomplete duplicates using identification of similarities in electronic documents is described.…”

Section: Literature Review and Problem Statementmentioning

confidence: 99%

Development of adaptive combined models for predicting time series based on similarity identification

Kuchansky

Biloshchytskyi

Andrashko

et al. 2018

EEJET

Self Cite

View full text Add to dashboard Cite

23. Neural crest and mesoderm lineage-dependent gene expression in orofacial development / Bhattacherjee V., Mukhopadhyay P., Singh S., Johnson C., Philipose J. T., Warner C. P. DEVELOPMENT OF ADAPTIVE COMBINED MODELS FOR PREDICTING TIME SERIES BASED ON SIMILARITY IDENTIFICATION A . K u c h a n s k yPhD, Associate Professor Department of Cybersecurity and Computer Engineering* E-mail: kuczanski@gmail.com A . B i l o s h c h y t s k y i Y u . A n d r a s h k oLecturer Department of System Analysis and Optimization Theory Uzhhorod National University Narodna sq., 3, Uzhhorod, Ukraine, 88000 S . B i l o s h c h y t s k a

show abstract

Identity Based Privacy Information Sharing with Similarity Test in Cloud Environment

Wang

Zheng

et al. 2018

Cloud Computing and Security

View full text Add to dashboard Cite

Detection of near dublicates in tables based on the locality-sensitive hashing method and the nearest neighbor method

Cited by 21 publications

References 12 publications

A method for the identification of scientists' research areas based on a cluster analysis of scientific publications

A method for the identification of scientists' research areas based on a cluster analysis of scientific publications

Development of adaptive combined models for predicting time series based on similarity identification

Identity Based Privacy Information Sharing with Similarity Test in Cloud Environment

Contact Info

Product

Resources

About