Morphological Disambiguation of Turkish Text with Perceptron Algorithm

Sak, Haşim; Güngör, Tunga; Saraçlar, Murat

doi:10.1007/978-3-540-70939-8_10

Cited by 54 publications

(47 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3 We used 10 different classification algorithms in the WEKA toolkit. The results of the classification algorithms and the previous approaches are given in Table 3.…”

Section: Resultsmentioning

confidence: 99%

“…The Perceptron Algorithm [3] is a combination of statistical and machine learning approaches. They use the Baseline Trigram-Based Model to generate n-best parses for each sentence.…”

Section: Morphological Disambiguationmentioning

confidence: 99%

“…To cope with the data sparseness problem, morphological parses are divided into smaller parts called inflectional groups [2]. The most recent approach to the morphological disambiguation problem is presented in [3]. The methodology employed is based on ranking of the most possible parse sequences (determined by the baseline statistical model represented in [1]) with Perceptron algorithm.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Novel Approach to Morphological Disambiguation for Turkish

Görgün

Yıldız

2011

Computer and Information Sciences II

View full text Add to dashboard Cite

In this paper, we propose a classification based approach to the morphological disambiguation for Turkish language. Due to complex morphology in Turkish, any word can get unlimited number of affixes resulting very large tag sets. The problem is defined as choosing one of parses of a word not taking the existing root word into consideration. We trained our model with well-known classifiers using WEKA toolkit and tested on a common test set. The best performance achieved is 95.61% by J48 Tree classifier.

show abstract

“…3 We used 10 different classification algorithms in the WEKA toolkit. The results of the classification algorithms and the previous approaches are given in Table 3.…”

Section: Resultsmentioning

confidence: 99%

“…The Perceptron Algorithm [3] is a combination of statistical and machine learning approaches. They use the Baseline Trigram-Based Model to generate n-best parses for each sentence.…”

Section: Morphological Disambiguationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Novel Approach to Morphological Disambiguation for Turkish

Görgün

Yıldız

2011

Computer and Information Sciences II

View full text Add to dashboard Cite

show abstract

“…Finally, we generate the normalized word forms from the now disambiguated sequence of morphemes. Our initial results are comparable to morphological disambiguation on Turkish texts, despite the fact that we have a much smaller training corpus (∼ 2800 sentences, compared to over 50,000 (Görgün and Yildiz, 2011) and 45,000 sentences (Sak et al, 2007)). A possible explanation is that Turkish morphology is more complex: Turkish has more productive suffixes than Quechua, and there are relatively complex morpho-phonological rules that determine word formation, such as two dimensional vowel harmony and context-sensitive realizations of consonants (Oflazer, 1994).…”

Section: Discussionmentioning

confidence: 55%

Morphological Disambiguation and Text Normalization for Southern Quechua Varieties

Gonzales

Mamani²

2014

Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

View full text Add to dashboard Cite

We built a pipeline to normalize Quechua texts through morphological analysis and disambiguation. Word forms are analyzed by a set of cascaded finite state transducers which split the words and rewrite the morphemes to a normalized form. However, some of these morphemes, or rather morpheme combinations, are ambiguous, which may affect the normalization. For this reason, we disambiguate the morpheme sequences with conditional random fields. Once we know the individual morphemes of a word, we can generate the normalized word form from the disambiguated morphemes. 1

show abstract

“…There are also several constraint-based methods for disambiguation [18,19]. Another method employs a perceptron algorithm for morphological disambiguation [20]. We use the tool produced by this study as a morphological parser ranging from preparing the corpus to the online question generation.…”

Section: Related Workmentioning

confidence: 99%

Morphological Annotation of a Corpus with a Collaborative Multiplayer Game

Gungor

Güngör

2010

Computational Linguistics and Intelligent Text Processing

Self Cite

View full text Add to dashboard Cite

Abstract. In most of the natural language processing tasks, state-ofthe-art systems usually rely on machine learning methods for building their mathematical models. Given that the majority of these systems employ supervised learning strategies, a corpus that is annotated for the problem area is essential. The current method for annotating a corpus is to hire several experts and make them annotate the corpus manually or by using a helper software. However, this method is costly and timeconsuming. In this paper, we propose a novel method that aims to solve these problems. By employing a multiplayer collaborative game that is playable by ordinary people on the Internet, it seems possible to direct the covert labour force so that people can contribute by just playing a fun game. Through a game site which incorporates some functionality inherited from social networking sites, people are motivated to contribute to the annotation process by answering questions about the underlying morphological features of a target word. The experiments show that the 63.5% of the actual question types are successful based on a two-phase evaluation.

show abstract

Morphological Disambiguation of Turkish Text with Perceptron Algorithm

Cited by 54 publications

References 13 publications

A Novel Approach to Morphological Disambiguation for Turkish

A Novel Approach to Morphological Disambiguation for Turkish

Morphological Disambiguation and Text Normalization for Southern Quechua Varieties

Morphological Annotation of a Corpus with a Collaborative Multiplayer Game

Contact Info

Product

Resources

About