This paper shows how morphological analysis contributes to solving the challenges posed by the development of a spelling checker for an agglutinative language like isiZulu. It demonstrates how the incremental implementation of affix removal rules can be used to derive word forms and enhance the lexical and error recall of the system. In the case of the spelling checker the strategies used are mainly based on the use of regular expressions, and more specifically on a process of stemming.
In this paper, we present a project where existing text-based core technologies were ported to Java-based web services from various architectures. These technologies were developed over a period of eight years through various government funded projects for 10 resource-scarce languages spoken in South Africa. We describe the API and a simple web front-end capable of completing various predefined tasks.
This article provides an overview of the process and initial outcomes of designing a multilingual corpus of academic texts produced by university students with different mother tongues in South Africa, with a view to making it available as an open resource for pedagogical applications and research. We first give an overview of the history of corpus development for pedagogical purposes world-wide, with particular emphasis on learner corpora, and highlight the absence of a South African corpus of academic learner texts. Thereafter, the objectives of the corpus project are outlined. The remainder of the article describes and justifies the designfeatures of the corpus as well as the process of setting up the data management system to facilitate the collection of the learner texts and their integration with the metadata. We conclude with a summary of the current status of the project, including the limitations, and a preview of the way forward.
Likert-type data is commonly used in many research fields in humanities: from gaging the usability of different user-interface designs, to determining users’ likeliness to vote for a particular political party, to evaluation of course materials–to name but a few examples. Despite its prevalence, there is still some disagreement within the statistics community on whether Likert-type scales are true ordinal variables, and by implication whether parametric tests are legitimate to be used in such cases (Endresen &Janda 2017).In this paper, we explore one parametric statistical test, viz. cumulative odds ordinal logistic regression (OLR),as an analysis method for self-reported data in the humanities. For illustration purposes, our focus is specifically on data of users’ self-reported usage of, and attitudes towards swearwords, with the aim of identifying demographic attributes that are predictive of their usage and/or attitudes. After a brief description of the data we’re using, including how the data is being collected, we give a layman’s overview of OLR. Since one of our aims is to demonstrate the usability of OLR, we apply our discussion practically to a step-by-step procedure (based on Laerd Statistics 2015) that could be followed easily. We demonstrate the usefulness of the results in reporting on the usage of, and attitude towards two near synonymous Afrikaans swearwords. We show, amongst others, that the odds ratios that are generated as part of the modelling procedure can be used to draw direct conclusions about specific demographic groups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.