2016
DOI: 10.1162/coli_a_00243
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Study of Minimally Supervised Morphological Segmentation

Abstract: This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small number of manually annotated word forms and a large set of unannotated word forms. In addition to providing a literature survey on published methods, we present an in-d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(15 citation statements)
references
References 32 publications
(97 reference statements)
0
14
0
Order By: Relevance
“…However, recall for the stm+suf+suf remains low for all compared systems. The boundary between two suffixes is the most difficult for Morfessor to place correctly (Ruokolainen et al, 2016).…”
Section: Resultsmentioning
confidence: 99%
“…However, recall for the stm+suf+suf remains low for all compared systems. The boundary between two suffixes is the most difficult for Morfessor to place correctly (Ruokolainen et al, 2016).…”
Section: Resultsmentioning
confidence: 99%
“…This method lacks the handling of representing contextual dependencies, such as stem and affix orders. Although several other unsupervised segmentation approaches have been proposed, it was shown by [9] that minimally supervised approaches provided better performance compared to solely unsupervised methods applied on large unlabeled datasets. For example, unsupervised experiments for Estonian, that achieved 73.3% F1 score with 3.9 million words, was outperformed by a supervised CRF that attained 82.1% with just 1000 word forms.…”
Section: Related Workmentioning
confidence: 99%
“…This may be justified by, for example, the automation of phonematic transcription and morpheme segmentation, the development of which involved tests of multiple supervised, semi-supervised and unsupervised machine learning solutions. The final solutions based on Conditional Random Fields and Support Vector Machines classifier with linear kernel outperformed other methods, especially those based on unsupervised and semisupervised machine learning techniques [5][6][7][8].…”
Section: National Photocorpus Of Polishmentioning
confidence: 99%
“…Among the difficulties that arise in the course of work are cases of two people sharing the same name. 5 Other sources of problems are spelling reforms, variant forms of last names, Polish spellings of foreign names (e.g. Schmidt/Szmit), occurrences of last names with no first names, and difficulties in distinguishing proper nouns from common nouns (e.g.…”
Section: Biography Of the Nationmentioning
confidence: 99%