2009
DOI: 10.4314/lex.v18i1.47257
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Computational Morphological Analysis of a Swahili Corpus for Lexicographic Purposes

Abstract: Abstract:Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary compilation.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2009
2009
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 7 publications
0
11
0
Order By: Relevance
“…An increasing number of publications however are showing that carefully selected corpus-based procedures can indeed bootstrap language technology for languages such as Amharic (Gambäck et al 2009), Northern Sotho (de Schryver andFaaß et al 2009), Swahili (De Pauw et al 2006De Pauw and de Schryver 2008;Steinberger et al this volume), Tswana (Groenewald 2009) and even very resource-scarce African languages Badenhorst et al, this volume;Scannell, this volume). This paper continues this novel and promising new trend in African language technology research, by presenting the development and deployment of the SAWA corpus, a two-million-word parallel corpus English-Swahili.…”
Section: Introductionmentioning
confidence: 93%
“…An increasing number of publications however are showing that carefully selected corpus-based procedures can indeed bootstrap language technology for languages such as Amharic (Gambäck et al 2009), Northern Sotho (de Schryver andFaaß et al 2009), Swahili (De Pauw et al 2006De Pauw and de Schryver 2008;Steinberger et al this volume), Tswana (Groenewald 2009) and even very resource-scarce African languages Badenhorst et al, this volume;Scannell, this volume). This paper continues this novel and promising new trend in African language technology research, by presenting the development and deployment of the SAWA corpus, a two-million-word parallel corpus English-Swahili.…”
Section: Introductionmentioning
confidence: 93%
“…To perform this kind of morphological analysis, we developed a machine learning system trained and evaluated on the Helsinki corpus of Swahili (Hurskainen, 2004). Experimental results show that the data-driven approach achieves stateof-the-art performance in a direct comparison with a rule-based method, with the added advantage of being robust to word forms for previously unseen lemmas (De Pauw and de Schryver, 2008). We can consequently use morphological deconstruction as a preprocessing step for the alignment task, similar to the method described by Goldwater and McClosky (2005), Oflazer (2008) and Stymne et al (2008).…”
Section: Nimemkataliamentioning
confidence: 99%
“…There is scarcity of sources in the sense that the digital text resources are few. The recent effort on the same is handled carefully with selected procedure for Swahili [2,3]. For language technology applications such as speech recognition system, text-to-speech synthesis, machine aided translation and web related issues there is a great need for translation and usability of the Swahili language.…”
Section: Introductionmentioning
confidence: 99%