2015
DOI: 10.1017/s1351324915000236
|View full text |Cite
|
Sign up to set email alerts
|

Modernising historical Slovene words

Abstract: We propose a language-independent word normalisation method and exemplify it on modernising historical Slovene words. Our method relies on character-level statistical machine translation (CSMT) and uses only shallow knowledge. We present relevant data on historical Slovene, consisting of two (partially) manually annotated corpora and the lexicons derived from these corpora, containing historical word-modern word pairs. The two lexicons are disjoint, with one serving as the training set containing 40,000 entrie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
31
0
4

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(37 citation statements)
references
References 28 publications
2
31
0
4
Order By: Relevance
“…A more recent approach is based on characterbased statistical machine translation applied to historical text (Pettersson et al, 2013;Sánchez-Martínez et al, 2013;Scherrer and Erjavec, 2013; or dialectal data (Scherrer and Ljubešić, 2016). This is conceptually very similar to our approach, except that we substitute the classical SMT algorithms for neural networks.…”
Section: Related Workmentioning
confidence: 99%
“…A more recent approach is based on characterbased statistical machine translation applied to historical text (Pettersson et al, 2013;Sánchez-Martínez et al, 2013;Scherrer and Erjavec, 2013; or dialectal data (Scherrer and Ljubešić, 2016). This is conceptually very similar to our approach, except that we substitute the classical SMT algorithms for neural networks.…”
Section: Related Workmentioning
confidence: 99%
“…We experiment with both SMT and NMT implementations as contrastive methods. For our SMT pipeline, we employ a fairly standard array of tools, and set their parameters similarly to Scherrer and Erjavec (2013) and Scherrer and Ljubešić (2016). For alignment, we use MGIZA (Gao and Vogel, 2008) with grow-diag-final-and symmetrization.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Internal variation in the data is only dealt with indirectly by mapping the non-standard types to a corresponding standard type. Hence, it resembles a translation task, a framework in which normalization has been approached (Kobus et al, 2008;Scherrer and Erjavec, 2016). The task of detecting spelling variants shifts the attention towards the internal variation and resembles an information retrieval task where the aim is to detect unordered pairs of types like GML {jc, ik} which are used to realize the same morphological word.…”
Section: Related Workmentioning
confidence: 99%