2007
DOI: 10.1007/s10579-007-9035-7
|View full text |Cite
|
Sign up to set email alerts
|

Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language

Abstract: Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0

Year Published

2008
2008
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(22 citation statements)
references
References 28 publications
1
21
0
Order By: Relevance
“…is used more frequently in the Turdis-2 Corpus, but the difference in frequency is less pronounced (there are five times as many instances in the Turdis-2 Corpus as in the BNSIint Corpus) than for the previous set of discourse markers. In general, according to Verdonik (2007) and Verdonik, Rojc et al (2007), the most significant pragmatic functions, as well as the contextual factors shaping the use of dobro . .…”
Section: Dobro 'Right 'mentioning
confidence: 99%
See 2 more Smart Citations
“…is used more frequently in the Turdis-2 Corpus, but the difference in frequency is less pronounced (there are five times as many instances in the Turdis-2 Corpus as in the BNSIint Corpus) than for the previous set of discourse markers. In general, according to Verdonik (2007) and Verdonik, Rojc et al (2007), the most significant pragmatic functions, as well as the contextual factors shaping the use of dobro . .…”
Section: Dobro 'Right 'mentioning
confidence: 99%
“…According to Verdonik (2007) and Verdonik, Rojc et al (2007), glejte 'look' is used approximately twice as frequently in telephone conversations as in television interviews. It is used to attract the hearers' attention, it signals that the speaker is about to explain something.…”
Section: Glejte 'Look' and Veste 'You Know'mentioning
confidence: 99%
See 1 more Smart Citation
“…Data with DM annotation is scarcer for languages other than English. In a corpus of transcribed conversations in Slovenian, Verdonik et al (2007) annotated transcripts of 106 minutes of speech (15,517 words) with about twenty types of Slovenian DMs and their variants, finding 2,158 tokens. In an analysis of Dutch connectors, Penning and Theune (2007) counted DMs in a corpus of about 97,000 words, using sampling to estimate the frequency of lexical items that can serve as DMs or non-DMs.…”
Section: Dm Frequencies In the Icsi-mr And Other Corporamentioning
confidence: 99%
“…Recently, Fitzgerald and Jelinek (2008) presented a new annotation scheme for spontaneous speech reconstruction, which has also been extended to Czech by Hajič et al (2008). Among related work on other Slavic languages, we can mention the paper about discourse marker annotation in spoken Slovenian by Verdonik et al (2007).…”
Section: Introductionmentioning
confidence: 99%