1994
DOI: 10.1080/09296179408590017
|View full text |Cite
|
Sign up to set email alerts
|

Detection of spelling errors in Swedish not using a word list En Clair*

Abstract: We investigate how to construct an efficient method for spelling error detection and correction under the prerequisite of using a word list that is encoded and not possible to decode. Our method is probabilistic and the word list is stored as a Bloom filter. In particular, we study how to handle compound words and inflections in Swedish.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

1999
1999
2014
2014

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 10 publications
0
9
0
Order By: Relevance
“…If a successful decomposition of an unknown compound word can be made, we have strong evidence of the correct possible tags for that word, since the last word form in a compound determines its part-of-speech. In STAVA [18][19][20] an algorithm for decomposing compounds into their word form parts was implemented.…”
Section: Analyzing Compound Wordsmentioning
confidence: 99%
“…If a successful decomposition of an unknown compound word can be made, we have strong evidence of the correct possible tags for that word, since the last word form in a compound determines its part-of-speech. In STAVA [18][19][20] an algorithm for decomposing compounds into their word form parts was implemented.…”
Section: Analyzing Compound Wordsmentioning
confidence: 99%
“…Accounting for and listing all the possible words is not feasible for purposes of error correction. Domeij proposed a method to build a spell checker that utilizes stem lists and orthographic rules, which govern how a word is written, and morphotactic rules, which govern how morphemes (building blocks of meanings) are allowed to combine, to accept legal combinations of stems (Domeij et al 1994). By breaking up compound words, dictionary lookup can be applied to individual constituent stems.…”
Section: Ocr Error Correctionmentioning
confidence: 99%
“…There are two main approaches to error correction, namely, word level and passage level. Some of the kinds of word-level postprocessing include the use of dictionary lookup [Brill and Moore 2000;Church and Gale 1991;Hong 1995;Jurafsky and Martin 2000a], character [Lu et al 1999;Taghva et al 1994] and word n-gram frequency analysis [Hong 1995;Magdy and Darwish 2006b], and morphological analysis [Domeij et al 1994;Oflazer 1996]. Passagelevel postprocessing techniques include the use of word n-grams [Magdy and Darwish 2006b], word collocations [Hong 1995], grammar [Agirre et al 1998] (which is challenging due to the current poorness of Arabic parsing [Moussa et al 2003]), conceptual closeness [Hong 1995], passage-level word clustering [Taghva et al 1994] (which requires handling of affixes for Arabic [De Roeck and Al-Fares 2000]), and linguistic and visual context [Hong 1995].…”
Section: Ocr Error Correctionmentioning
confidence: 99%