Comparing a linguistic and a stochastic tagger

Samuelsson, Christer; Voutilainen, A

doi:10.3115/976909.979649

Cited by 34 publications

(22 citation statements)

References 16 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Taggers can be based on stochastic models [2-7], on rules [8,9], or on neural networks [10]. In a recent paper, Samuelsson and Voutilainen claim that rule-based taggers can give higher tagging accuracy than plain stochastic taggers on correct texts [11]. However, hybrids between rule-based taggers and stochastic taggers might be even better [12].…”

mentioning

confidence: 99%

Implementing an efficient part-of-speech tagger

Carlberger

Kann

1999

Softw: Pract. Exper.

View full text Add to dashboard Cite

An efficient implementation of a part-of-speech tagger for Swedish is described. The stochastic tagger uses a well-established Markov model of the language. The tagger tags 92 per cent of unknown words correctly and up to 97 per cent of all words. Several implementation and optimization considerations are discussed. The main contribution of this paper is the thorough description of the tagging algorithm and the addition of a number of improvements. The paper contains enough detail for the reader to construct a tagger for his own language. 816 J. CARLBERGER AND V. KANN grammar checking. The applications require the tagger to be both efficient (to tag quickly, especially important in information retrieval), and accurate (to tag correctly, especially important in translation). In some applications, it is not even enough to have the text syntactically disambiguated -a word sense disambiguation is needed, and that is an even harder problem [1].Part-of-speech taggers can be constructed in various ways, and different types of taggers have different advantages. Taggers can be based on stochastic models [2-7], on rules [8,9], or on neural networks [10]. In a recent paper, Samuelsson and Voutilainen claim that rule-based taggers can give higher tagging accuracy than plain stochastic taggers on correct texts [11]. However, hybrids between rule-based taggers and stochastic taggers might be even better [12].Some different stochastic models for tagging unknown words exist [2,4]. A good survey of automatic stochastic part-of-speech tagging is Charniak [13].In this paper, we describe an implementation of a part-of-speech tagger for Swedish. We wanted the tagger to be easy to implement, fast, language independent, tag set independent, and that it should give high accuracy of tagging. We also wanted the tagger to be able to cope with unknown words and grammatically erroneous sentences. This ability is needed in various applications, such as grammar and spell checking.Given these requirements, we chose to construct a stochastic tagger based on a Markov model. Our goal was to achieve 95 per cent tagging accuracy for known words and 70 per cent accuracy for unknown words, and we both reached and surpassed the goal.We use the tagger in a grammar checking program for Swedish, named GRANSKA, but we designed it to be as language independent as possible, and we think that it can be used for most inflectional languages, for any tag set, and in any application needing part-of-speech tagging. As it turned out, when incorporated into GRANSKA, our tagger actually became a hybrid between a stochastic tagger and a rule-based tagger. For certain complicated cases where the stochastic tagger could be wrong, we use rules to find the correct tagging. THE TAGGING MODEL Markov modelIn this section, we briefly describe the Markov model that is used as a stochastic model of the language. A complete and excellent description of the equations used in the standard Markov model for part-of-speech tagging can be found in Charniak et al. [2].

show abstract

mentioning

confidence: 99%

Implementing an efficient part-of-speech tagger

Carlberger

Kann

1999

Softw: Pract. Exper.

View full text Add to dashboard Cite

show abstract

“…It calculates the lexical probabilities of unknown words based on their suffixes. Comparison between statistical and linguistic rule based taggers shows that for the same amount of remaining ambiguity, the error rate of a statistical tagger is one order of magnitude greater than that of the rule based one [4]. The taggers described above are specifically designed for relatively fixed word order languages, where position of the word plays an important role.…”

Section: Literature Survey a Existing Workmentioning

confidence: 97%

A Suffix-Based Noun and Verb Classifier for an Inflectional Language

Saharia

Sharma

Kalita

2010

2010 International Conference on Asian Language Processing

View full text Add to dashboard Cite

show abstract

“…Their claim of better quality with comparable development time for the constraint-based grammar, however, loses in importance because the HMM tagger was trained using unsupervised training only. A more recent comparison was reported by Samuelsson and Voutilainen (1997). A state-of-the-art statistical tagger was trained on a corpus of over 300,000 words manually analysed (and proofread several times) according to the EngCG grammatical representation.…”

Section: The Current Situationmentioning

confidence: 99%

“…Its error rate is between a half and two-thirds of that of the older versions, while the amount of ambiguity it leaves is well below half of that left by the older versions. A performance test and a comparison to a state-of-the-art statistical tagger is reported by Samuelsson and Voutilainen (1997). EngCG-2 documentation and an interactive demo can be found at the following URL: http://www .…”

Section: Some Facts About a Large Grammarmentioning

confidence: 99%

Syntactic Wordclass Tagging

Halteren¹

1999

Text, Speech and Language Technology

View full text Add to dashboard Cite

Comparing a linguistic and a stochastic tagger

Cited by 34 publications

References 16 publications

Implementing an efficient part-of-speech tagger

Implementing an efficient part-of-speech tagger

A Suffix-Based Noun and Verb Classifier for an Inflectional Language

Syntactic Wordclass Tagging

Contact Info

Product

Resources

About