2012
DOI: 10.15398/jlm.v0i1.35
|View full text |Cite
|
Sign up to set email alerts
|

Slovak Morphosyntactic Tagset

Abstract: Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their informational contents. The tags consist of ASCII letters, numbers and several other characters. In general, they have a variable number of symbols, but their order is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 3 publications
0
3
0
Order By: Relevance
“…The most important contribution to the automated morphological analysis of the Slovak language was a proposal of the morphologically annotated corpus of the by Slovak National Corpus. The corpus defines a set of Part-of-speech (POS) tags and annotation guidelines for the Slovak language, and it is still the most used set of morphological tags [25]. The tags are of unequal length, but most tags follow the same structure for the same inflectional paradigm.…”
Section: Methodsmentioning
confidence: 99%
“…The most important contribution to the automated morphological analysis of the Slovak language was a proposal of the morphologically annotated corpus of the by Slovak National Corpus. The corpus defines a set of Part-of-speech (POS) tags and annotation guidelines for the Slovak language, and it is still the most used set of morphological tags [25]. The tags are of unequal length, but most tags follow the same structure for the same inflectional paradigm.…”
Section: Methodsmentioning
confidence: 99%
“…State-of-the-art lemmatization and tagging for Slovak is performed by the MorphoDiTa tagger (Straková et al 2014) that we trained on manually lemmatized and MSD annotated Slovak corpus r-mak-6.0 of 1.2 million tokens. 6 The accuracy of lemmatization on general texts is 98.2%; the accuracy of MSD tagging 94.2%; the accuracy of POS tagging 98.1% and the combined accuracy of lemmatization+MSD tagging is 93.5% (Garabík -Mitana 2022). Unfortunately, we cannot easily estimate the accuracy on the texts from the legal language domain; nevertheless, since the training corpus r-mak-6.0 contains one manually lemmatized and MSD annotated legal text, although strictly speaking not a law (Programové vyhlásenie vlády 7 ), we can obtain at least a rough estimate by training a separate tagger model on the rest of the corpus and calculating the accuracy on the one legal text.…”
Section: Lemmatization and Msd Taggingmentioning
confidence: 99%
“…At the first step of our study we prepare morphologically annotated sentence-aligned parallel texts. The Slovak texts are morphologically annotated automatically by the tagger Morče which has been trained and tuned on tagset, developed by the Slovak National Corpus (Garabík & Šimková, 2012).…”
Section: Morphological Annotationmentioning
confidence: 99%