The African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is used to continually develop the African Wordnets manually. This is a labour-intensive work that needs to be performed by linguistic experts, guided by several considerations such as the level of lexicalisation of a term in the African language. Up to now, linguists were responsible for identifying and translating appropriate synsets without much help from electronic resources because in the case of African languages even basic resources such as computer readable and electronic bilingual wordlists are usually not freely available. Methods to speed up the manual development of synsets and ease the workload of the human language experts were recently investigated. These centred around utilising the minimal amount of information available in bilingual dictionaries to identify synsets in the PWN that should be included in the AWN, transferring information from dictionaries to the wordnet and presenting the potential synsets to linguists for final approval and inclusion in the wordnets. In this article, we describe the methodology developed for building the African Wordnets, a potentially significant resource for natural language processing applications. Available resources that could be taken advantage of and resources that had to be developed are investigated, and initial results and future plans are explained.
Transcending the boundaries of printed lexicographic resources is becoming easier in the digital age, with e-resources facilitating restrictions on the size and type of information that can be included. In this article we explore innovative ways of documenting and preserving African indigenous knowledge, often underrepresented in traditional dictionaries, in an existing digital lexical database. Our approach is based on the extension of the African Wordnet, a lexical database under construction for nine African languages, in this case applied to isiZulu. This article addresses the challenge of consolidating dispersed indigenous knowledge collected from a variety of sources such as conventional dictionaries, interdisciplinary publications and a flat-structured online database, in a digitised hierarchical wordnet structure. A representative sample of traditional domestic utensils in Zulu culture is used to demonstrate the conversion into a set of typical semantic relations in a wordnet structure. By focusing on filling lexical gaps between isiZulu and English as found in the Princeton WordNet, with culturally relevant synsets, the African Wordnet also becomes a useful resource for natural language processing. Finally, it is shown how the hierarchical classification of selected domestic utensils is visually presented in wordnet graphs in the Wordnet-Loom interface.
Optimization of Free Online/Electronic Resources for Dictionary Compilation — A Trilingual Dictionary Experiment. The availability of multilingual dictionaries is crucial, not only for direct target users, but also for indirect target users, especially in the case of languages with scarce resources such as Venda. This article explores the optimal use of free electronic/online resources for compiling a trilingual e-dictionary for Venda, English and Afrikaans. Our approach is based on an experiment in which the compilation process was automated as far as possible to achieve savings in terms of time and manpower. English is used as a bridge for the translation between the source language, Venda, and the target language, Afrikaans. The general finding is that certain limitations can be expected in such a semi-automated process that requires a certain amount of human intervention. Although the composite e-dictionary cannot be considered a final product, the dictionary compilation program Lexonomy, which has been used successfully in this study due to its adaptability and easy layout, provides the opportunity for human input to make the necessary adaptations in a user-friendly manner. The proposed concept is useful for creating multilingual online dictionaries, compiled using available online or electronic resources. The resulting trilingual dictionary is available online as proof of concept on which further work can build. The fact that the database underlying the dictionary is available in a machine-readable format, namely XML, is important for indirect target users for reuse to develop electronic resources, especially for resource-scarce languages. Keywords: dictionary compilation, Venda–English–Afrikaans, trilingual dictionary, electronic/online resources, machine translation systems, lexonomy, corpus search, target users
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.