2018
DOI: 10.26555/ijain.v4i1.152
|View full text |Cite
|
Sign up to set email alerts
|

The performance of text similarity algorithms

Abstract: Text similarity measurement compares text with available references to indicate the degree of similarity between those objects. There have been many studies of text similarity and resulting in various approaches and algorithms. This paper investigates four majors text similarity measurements which include String-based, Corpus-based, Knowledge-based, and Hybrid similarities. The results of the investigation showed that the semantic similarity approach is more rational in finding substantial relationship between… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
15
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(20 citation statements)
references
References 31 publications
1
15
0
2
Order By: Relevance
“…So, every node is a knowledge unit in the network. The citation network was built through Sci2 tool software and then every referenced citation was chosen to identify articles with a 95% similarity through the Jaro-Wikker algorithm and to be able to remove duplicates (Prasetya et al, 2018). Finally, the citation network was updated through merge node.…”
Section: Methodsmentioning
confidence: 99%
“…So, every node is a knowledge unit in the network. The citation network was built through Sci2 tool software and then every referenced citation was chosen to identify articles with a 95% similarity through the Jaro-Wikker algorithm and to be able to remove duplicates (Prasetya et al, 2018). Finally, the citation network was updated through merge node.…”
Section: Methodsmentioning
confidence: 99%
“…Character-based menggunakan algoritma Smith Waterman, N-gram, Damerau-Lavenshtein, Jaro-Winkler, Longgest Common Substring (LCS), dan lain sebagainya sedangkan term-based menggunakan algoritma block distance, joccard similarity, matching fefficient, dan overlap coefficient [1]. Dalam pendekatan mining, algoritma yang digunakan untuk mencari informasi dalam text menggunakan algoritma seperti : Information Retrieval (IR), text clasification, information extraction (IE), document clustering, sentiment analysus, machine translation, text summarization, dan natural language processing (NLP) [4].…”
Section: Text Similarity Algorithmsunclassified
“…Adequately, sequences extracted from event logs can be encoded by letters. Indeed, string similarity algorithms [14] can be used to comprehend the control-flow perspective of the process more informatively. In this sense, it is primordial to 1) define a coherent set of traces, 2) calculate the similarity between events and clusters, 3) generate the aligned sequences, and 4) visualize sequences with an informative score.…”
Section: Introductionmentioning
confidence: 99%