2004
DOI: 10.1162/0891201042544938
|View full text |Cite
|
Sign up to set email alerts
|

Fast Approximate Search in Large Dictionaries

Abstract: The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k. In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
40
0

Year Published

2006
2006
2017
2017

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 51 publications
(40 citation statements)
references
References 40 publications
0
40
0
Order By: Relevance
“…When dealing with misspelled queries, the aim is to replace the erroneous term or terms in the query with those considered to be the correct ones and whose edit distance with regard to the former is the smallest possible. This will imply a greater or lesser quality and computational complexity according to the strategy adopted (Mihov and Schulz, 2004).…”
Section: The Spelling Correction Approachmentioning
confidence: 99%
“…When dealing with misspelled queries, the aim is to replace the erroneous term or terms in the query with those considered to be the correct ones and whose edit distance with regard to the former is the smallest possible. This will imply a greater or lesser quality and computational complexity according to the strategy adopted (Mihov and Schulz, 2004).…”
Section: The Spelling Correction Approachmentioning
confidence: 99%
“…(There are more efficient ways of achieving the same result; see, for example, Oflazer (1996), Savary (2002) or Mihov and Schulz (2004). ) Some spellcheckers assume that the first letter of the misspelling is correct, which it usually is (Yannakoudakis and Fawthrop 1983), to save themselves the bother of looking up words from all parts of the dictionary.…”
Section: A Simple Correctormentioning
confidence: 99%
“…This allows to start with an exact match and then extend these exact matches to longer candidates whereas the threshold increases slowly in a stepwise manner. In order to implement this idea in practice two kind of resources are used: (i) a linear space representation of the infixes in the finite set of words that enables a left/right extension of an infix in constant time per character; and (ii) efficient filters, universal Levenshtein automata [18], sychnorised Levenshtein automata [19] and standard Ukkonen filter [27], that prune the unsuccessful candidates as soon as a clear evidence for this occurs. In the index structure information about the possible lengths of longest/shortest left/right possible extensions are encoded.…”
Section: Teammentioning
confidence: 99%