Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters &Amp; 2006
DOI: 10.3115/1608974.1609002
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised discovery of Persian morphemes

Abstract: This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian language through automatic analysis of corpora. We utilized a Minimum Description Length (MDL) based algorithm with some improvements and applied it to Persian corpus. Our improvements include enhancing the cost function using some heuristics, preventing the split of high frequency chunks, exploiting penalty for first and last letters and di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

1
1
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 11 publications
(6 reference statements)
1
1
0
Order By: Relevance
“…The Finite-State Morphology (FSM) has covered both morphological analysis and generation aspects of computational morphology addressing various languages some of which have been well-equipped by NLP and CL tools and some not. While resourceful languages such English [Beesley and Karttunen 2003;Minnen et al 2001] and German [Schmid 2005] are front-runners in FSM studies, we also observe similar scholarly attempts regarding other languages which might not be considered as widely-studied as English or German such as Uralic languages [Novák 2015], Arabic [Soudi et al 2007] and Persian [Arabsorkhi and Shamsfard 2006;Megerdoomian et al 2000]. This is also correct for less-studied languages such as Croatian [Mihajlović 2014].…”
Section: Related Worksupporting
confidence: 79%
“…The Finite-State Morphology (FSM) has covered both morphological analysis and generation aspects of computational morphology addressing various languages some of which have been well-equipped by NLP and CL tools and some not. While resourceful languages such English [Beesley and Karttunen 2003;Minnen et al 2001] and German [Schmid 2005] are front-runners in FSM studies, we also observe similar scholarly attempts regarding other languages which might not be considered as widely-studied as English or German such as Uralic languages [Novák 2015], Arabic [Soudi et al 2007] and Persian [Arabsorkhi and Shamsfard 2006;Megerdoomian et al 2000]. This is also correct for less-studied languages such as Croatian [Mihajlović 2014].…”
Section: Related Worksupporting
confidence: 79%
“…Although they segmented very low portion of Persian words (only some Persian verbs), the quality of their machine translation system increases by 1.9 points of BLEU score. Arabsorkhi and Shamsfard (2006) proposed a Minimum Description Length (MDL) based algorithm with some improvements for discovering the morphemes of Persian language through automatic analysis of corpora.…”
Section: Related Workmentioning
confidence: 99%