1997
DOI: 10.1016/s0304-3975(96)00268-x
|View full text |Cite
|
Sign up to set email alerts
|

Block edit models for approximate string matching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0
2

Year Published

2002
2002
2011
2011

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 80 publications
(40 citation statements)
references
References 13 publications
0
37
0
2
Order By: Relevance
“…Computing the edit distance in the presence of such large-scale operations is typically NP-hard, depending on the exact model [10,11], and authors concentrate on providing approximation algorithms [12][13][14] or algorithms for special cases where editing proceeds from left to right [15]. In general, these problems differ from ours by allowing a richer set of operations, but also by fixing both endpoints of the history, whereas we minimize over all possible initial permutations.…”
Section: (A) Motivation and Related Workmentioning
confidence: 99%
“…Computing the edit distance in the presence of such large-scale operations is typically NP-hard, depending on the exact model [10,11], and authors concentrate on providing approximation algorithms [12][13][14] or algorithms for special cases where editing proceeds from left to right [15]. In general, these problems differ from ours by allowing a richer set of operations, but also by fixing both endpoints of the history, whereas we minimize over all possible initial permutations.…”
Section: (A) Motivation and Related Workmentioning
confidence: 99%
“…A number of similarity functions for approximately matching strings have been proposed in the research literature. Popular measures include the Jaccard coefficient and Cosine similarity metrics from information retrieval (IR) [19,8], extensions (of Cosine similarity) to use q-grams instead of words [17], and the edit distance family of functions [10,24,18,22]. We use sima(u, v) to denote the similarity between strings u and v when u and v are considered as values of the attribute a.…”
Section: Similarity Between Value Pairsmentioning
confidence: 99%
“…The edit distance metric works well for typographical errors but it cannot capture word rearrangements, insertions, and deletions. To address this, numerous variants of the edit distance metric have been proposed in the literature like affine gap distance [24] that allows gap mismatches, block edit distance [18] that allows word moves, and a fuzzy match similarity function that allows words to be inserted/deleted with a cost equal to the IDF weight of the word [22]. However, most variants either do not handle word rearrangements well, or are too expensive from a computation perspective.…”
Section: Related Workmentioning
confidence: 99%
“…Unfortunately, many of the interesting varieties of the block edit problem are NP-complete. 19 An NP-complete block edit problem can be solved optimally for a fixed input size-larger than is feasible with present-day computers-using postand at-fabrication time computation. Although approximation algorithms may exist for finding suboptimal solutions, we are interested in finding the optimal solution.…”
Section: An Example: Solving the Block Edit Problemmentioning
confidence: 99%