2014
DOI: 10.1145/2627692.2627706
|View full text |Cite
|
Sign up to set email alerts
|

State-of-the-art in string similarity search and join

Abstract: String similarity search and its variants are fundamental problems with many applications in areas such as data integration, data quality, computational linguistics, or bioinformatics. A plethora of methods have been developed over the last decades. Obtaining an overview of the state-of-the-art in this field is difficult, as results are published in various domains without much cross-talk, papers use different data sets and often study subtle variations of the core problems, and the sheer number of proposed me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 38 publications
(15 citation statements)
references
References 26 publications
0
15
0
Order By: Relevance
“…As a well-known problem in data mining, the purpose of string similarity search is to find all strings within a given edit distance from the query string in a set of strings [12], [13], [14], [15], [16], [17]. However, most related researches focus on building the index of a fixed size set of strings to improve the performance of query [14], [15], [16], [17].…”
Section: A Related Workmentioning
confidence: 99%
“…As a well-known problem in data mining, the purpose of string similarity search is to find all strings within a given edit distance from the query string in a set of strings [12], [13], [14], [15], [16], [17]. However, most related researches focus on building the index of a fixed size set of strings to improve the performance of query [14], [15], [16], [17].…”
Section: A Related Workmentioning
confidence: 99%
“…Given a query, finding all approximate matching strings in a string collection with edit distance constraints is a well-studied problem [15,20]. It includes two typical sub-problems.…”
Section: Related Workmentioning
confidence: 99%
“…SSS algorithms and very recently partition based SSJ algorithms are omitted in their work. Yu et al [51] and Wandelt et al [44] survey the works on string similarity. However, string similarity is different from set similarity for their different working manner and similarity functions [44].…”
Section: Introductionmentioning
confidence: 99%