Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing 2017
DOI: 10.18653/v1/w17-1406
|View full text |Cite
|
Sign up to set email alerts
|

The Universal Dependencies Treebank for Slovenian

Abstract: This paper introduces the Universal Dependencies Treebank for Slovenian. We overview the existing dependency treebanks for Slovenian and then detail the conversion of the ssj200k treebank to the framework of Universal Dependencies version 2. We explain the mapping of part-of-speech categories, morphosyntactic features, and the dependency relations, focusing on the more problematic language-specific issues. We conclude with a quantitative overview of the treebank and directions for further work.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0
3

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 31 publications
(34 citation statements)
references
References 9 publications
(12 reference statements)
0
28
0
3
Order By: Relevance
“…discourse:filler for annotation of hesitation sounds). In subsequent comparison of the SST treebank with the written SSJ Slovenian UD treebank (Dobrovoljc et al, 2017), Dobrovoljc and Nivre (2016) observed several syntactic differences between the two modalities, as also illustrated in Figure 1.…”
Section: Spoken Slovenian Treebankmentioning
confidence: 85%
See 1 more Smart Citation
“…discourse:filler for annotation of hesitation sounds). In subsequent comparison of the SST treebank with the written SSJ Slovenian UD treebank (Dobrovoljc et al, 2017), Dobrovoljc and Nivre (2016) observed several syntactic differences between the two modalities, as also illustrated in Figure 1.…”
Section: Spoken Slovenian Treebankmentioning
confidence: 85%
“…The Spoken Slovenian Treebank (Dobrovoljc and Nivre, 2016), which was first released as part of UD v1.3 (under the CC-BY-NC-SA 4.0 licence), is the first syntactically annotated collection of spontaneous speech in Slovenian. It is a sample of the Gos reference corpus of Spoken Slovenian (Zwitter Vitez et al, 2013), a collection of transcribed audio recordings of spontaneous speech in different everyday situations, in both public (TV and radio shows, school lessons, academic lectures etc.)…”
Section: Spoken Slovenian Treebankmentioning
confidence: 99%
“…The UD policy for such cases is connecting them to the main clause with a parataxis tag. Some of the UD spoken treebanks (Dobrovoljc and Nivre, 2016;Gerdes and Kahane, 2017; keep the discourse information via the subtype parataxis:discourse. We follow their approach and employ the same tag as exemplified in 10…”
Section: Issues Related To Spoken Languagementioning
confidence: 99%
“…However, due to numerous differences between the two systems of annotation, especially on the level of syntactic description, a complex system of conversion rules was additionally created. 4…”
Section: Part-of-speech Tagging and Lemmatizationmentioning
confidence: 99%