1964
DOI: 10.1145/363958.363994
|View full text |Cite
|
Sign up to set email alerts
|

A technique for computer detection and correction of spelling errors

Abstract: The method described assumes that a word which cannot be found in a dictionary has at most one error, which might be a wrong, missing or extra letter or a single transposition. The unidentified input word is compared to the dictionary again, testing each time to see if the words match—assuming one of these errors occurred. During a test run on garbled text, correct identifications were made for over 95 percent of these error types.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
581
2
41

Year Published

1995
1995
2018
2018

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 1,129 publications
(674 citation statements)
references
References 3 publications
5
581
2
41
Order By: Relevance
“…Some examples of this approach are Hamming distance [6], Levenshtein distance [7]. Damerau-Levenshtein [7], [8], Needleman-Wunsch [9], Longest Common Subsequence [10]. Smith-Waterman [11], Jaro [12], JaroWinkler [13], and N-gram [14], [15].…”
Section: Text Similarity Algorithmsmentioning
confidence: 99%
“…Some examples of this approach are Hamming distance [6], Levenshtein distance [7]. Damerau-Levenshtein [7], [8], Needleman-Wunsch [9], Longest Common Subsequence [10]. Smith-Waterman [11], Jaro [12], JaroWinkler [13], and N-gram [14], [15].…”
Section: Text Similarity Algorithmsmentioning
confidence: 99%
“…To identify plausible misspellings, we rely on the Damerau-Levenshtein distance [2,6]: the minimum number of insertions, deletions, substitutions or transpositions required to transform one string into another. For example, faceboolk, facebok, faceboik, and faceboko each have a Damerau-Levenshtein distance of 1 from facebook.…”
Section: Identifying Typosquatting Domainsmentioning
confidence: 99%
“…The Damerau-Levenshtein metric, also known as edit distance, is a measure of string similarity defined as the minimal number of operations needed to transform one string into another (Damerau, 1964). The distance between the names s and t would be the number of edit operations that convert s into t. Assuming that most misspellings are single-character errors, as has been shown by different studies (Damerau, 1989;Petersen, 1986;Pollok & Zamora, 1983), the edit operations would consist of the insertion, deletion, or substitution of a single character, or the transposition of two characters, taking into account the cost of each operation.…”
Section: Similarity Relationsmentioning
confidence: 99%