2010
DOI: 10.14746/il.2010.21.1
|View full text |Cite
|
Sign up to set email alerts
|

Creating and Weighting Hunspell Dictionariesas Finite-State Automata

Abstract: Abstract. There are numerous formats for writing spell-checkers for open-source systems and there are many lexical descriptions for natural languages written in these formats. In this paper, we demonstrate a method for converting Hunspell and related spell-checking lexicons into finite-state automata. We also present a simple way to apply unigram corpus training in order to improve the spellchecking suggestion mechanism using weighted finite-state technology. What we propose is a generic and efficient language… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…The Hunspell spelling system is adapted to languages with rich morphology, and was originally developed for Hungarian. In OpenOffice.org Hunspell supports over 98 languages (Pirinen and Lindén 2010). When using these spell checkers for clinical text medical, or clinical dictionaries need to be added, see Patrick and Nguyen (2011).…”
Section: Spell Checking Of Clinical Textmentioning
confidence: 99%
“…The Hunspell spelling system is adapted to languages with rich morphology, and was originally developed for Hungarian. In OpenOffice.org Hunspell supports over 98 languages (Pirinen and Lindén 2010). When using these spell checkers for clinical text medical, or clinical dictionaries need to be added, see Patrick and Nguyen (2011).…”
Section: Spell Checking Of Clinical Textmentioning
confidence: 99%
“…They use test corpora containing words and calculate an error value for every record using overstemming and understemming indices [2]. We also found a very detailed Hungarian paper comparing some of the analyzers that we will evaluate, namely Hunmorph-Ocamorph [16], Hunmorph-Foma [5], Humor [13] and Hunspell [9]. The authors also brought in some other models that we didn't consider, as they can only be used for stemming purposes and not lemmatization or deeper morphological analysis: the Porter stemmer [10], the Hungarian adaptation of Snowball [11,14] and some Apache Lucene [6] modules like KStem, Porter, EnglishMinimal, Stempel and Morfologik.…”
Section: Morphological Analyzersmentioning
confidence: 99%
“…Hunspell [9] is a popular open-source spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox and Thunderbird, Google Chrome, etc. Although it's mainly used for spell checking, it has other use cases as well, including morphological analysis.…”
Section: Hunspellmentioning
confidence: 99%
“…A real-word spelling error occurs when one word is mistakenly used for another word, such as in fly form* Paris. Typical approaches include using confusion set (Golding and Roth, 1999;Carlson et al, 2001), contextual information (Verberne, 2002;Islam and Inkpen, 2009), and others (Pirinen and Linden, 2010;Amorim and Zampieri, 2013).…”
Section: Introductionmentioning
confidence: 99%