An innovative online Swahili-English dictionary project is presented. A careful study of some of the log files attached to this reference work reveals some hitherto unknown aspects of true dictionary look-up behaviour, which results in the depreciation of the importance of corpora for dictionary making. Three lexicography software modules are advanced to further enhance the success of the online dictionary.
In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching.
Abstract:Computational morphological analysis is an important first step in the automatic treatment of natural language and a useful lexicographic tool. This article describes a corpus-based approach to the morphological analysis of Swahili. We particularly focus our discussion on its ability to retrieve lemmas for word forms and evaluate it as a tool for corpus-based dictionary compilation.
Worldwide, semi-automatically extracting terms from corpora is becoming the norm for the compilation of terminology lists, term banks or dictionaries for special purposes. If Africanlanguage terminologists are willing to take their rightful place in the new millennium, they must not only take cognisance of this trend but also be ready to implement the new technology. In this article it is advocated that the best way to do the latter two at this stage, is to opt for computationally straightforward alternatives (i.e. use 'raw corpora') and to make use of widely available software tools (e.g. WordSmith Tools). The main aim is therefore to discover whether or not the semiautomatic extraction of terminology from untagged and unmarked running text by means of basic corpus query software is feasible for the African languages. In order to answer this question a fullblown case study revolving around Northern Sotho linguistic texts is discussed in great detail. The computational results are compared throughout with the outcome of a manual excerption, and vice versa. Attention is given to the concepts 'recall' and 'precision'; different approaches are suggested for the treatment of single-word terms versus multi-word terms; and the various findings are summarised in a Linguistics Terminology lexicon presented as an Appendix.
Abstract:The aim of this article is to investigate, from a lexicographic perspective, the preferences of Northern Sotho mother-tongue speakers for loan words versus so-called 'traditional' or 'original' counterparts in the language. Results obtained from a survey conducted among 100 randomly selected mother-tongue speakers from different age and gender groups, backgrounds, places of residence, etc. will be analysed. It is shown that although the overwhelming preference of the respondents lies with the use of (more) indigenous words in comparison to loan words, lexicographers should be alerted to possible, even rapid, changes in this preference pattern. The results from the survey are compared throughout with frequency counts derived from a corpus as well as with current dictionary treatment. Keywords: LEXICOGRAPHY, DICTIONARY, LEMMATISATION, NORTHERN SOTHO (SEPEDI), LOAN WORD, SOTHOISED WORD, INDIGENOUS WORD, QUESTIONNAIRE, CORPUS, DESCRIPTIVENESS, PROSCRIPTIVENESS, PRESCRIPTIVENESS, PREFERENCE PATTERN Senaganwa: Maadingwa ge a bapetšwa le Mantšu a Setlogo go Sesotho saLeboa -Kgopolo ya Bangwalapukuntšu. Maikemišetšo a taodišwana ye ke go nyakišiša, go ya ka kgopolo ya bangwalapukuntšu, ka fao baboledi ba Sesotho sa Leboa ba dirago kgetho ya mantšu magareng ga maadingwa le mantšu a setlogo polelong ye. Dipoelo tše di hweditšwego go tšwa go bakgathatema ba e lego baboledi ba Sesotho sa Leboa, banna le basadi, ba lekgolo (100) ba mengwaga ya go fapana, maemo a a fapanego a thuto, ba ba dulago mafelong ao a fapafapanego, bj.bj. di tla fetlekwa. Go ipontšha gore le ge dipoelo tša nyakišišo ye di laetša gore bontši bja bakgathatema bo kgetha go šomiša mantšu a setlogo go ena le maadingwa, bangwadi ba *
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.