1999
DOI: 10.1002/(sici)1097-024x(19990725)29:9<815::aid-spe256>3.0.co;2-f
|View full text |Cite
|
Sign up to set email alerts
|

Implementing an efficient part-of-speech tagger

Abstract: An efficient implementation of a part-of-speech tagger for Swedish is described. The stochastic tagger uses a well-established Markov model of the language. The tagger tags 92 per cent of unknown words correctly and up to 97 per cent of all words. Several implementation and optimization considerations are discussed. The main contribution of this paper is the thorough description of the tagging algorithm and the addition of a number of improvements. The paper contains enough detail for the reader to construct a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
24
0

Year Published

2002
2002
2014
2014

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 31 publications
(25 citation statements)
references
References 12 publications
(8 reference statements)
1
24
0
Order By: Relevance
“…Most of the systems evaluated are not adapted for Swedish, but use generic and fairly language-independent models that have previously been developed for and tested on languages other than Swedish. The primary exception is the Granska Tagger (Carlberger and Kann, 1999), which was designed particularly for Swedish and includes a large lexicon as well as a compound analyzer. In part because of this, Granska…”
Section: Swedish Pos Taggingmentioning
confidence: 65%
See 3 more Smart Citations
“…Most of the systems evaluated are not adapted for Swedish, but use generic and fairly language-independent models that have previously been developed for and tested on languages other than Swedish. The primary exception is the Granska Tagger (Carlberger and Kann, 1999), which was designed particularly for Swedish and includes a large lexicon as well as a compound analyzer. In part because of this, Granska…”
Section: Swedish Pos Taggingmentioning
confidence: 65%
“…Although previous studies have shown that using modied versions of the SUC 2 tagset can lead to more accurate PoS tagging (Forsbom, 2008;Carlberger and Kann, 1999), we have decided to keep the SUC 2 tagset, since this has become a de-facto standard for Swedish PoS tagging. Forsbom and Wilhelmsson (2010) previously found that a subset of the annotation changes now used in SUC 3 led to an improvement in accuracy for a data-driven PoS tagger, compared to version 2 of SUC.…”
Section: Stockholm-umeå Corpusmentioning
confidence: 99%
See 2 more Smart Citations
“…To improve training, we used only verbs and nouns [5]. We tagged the corpus words with their part of speech using the Granska tagger [7]. We kept the nouns and the verbs and discarded the rest of the words, including the markers from the patterns.…”
Section: Extraction Of Text Spansmentioning
confidence: 99%