Elsabé Taljard scite author profile

Worldwide, semi-automatically extracting terms from corpora is becoming the norm for the compilation of terminology lists, term banks or dictionaries for special purposes. If Africanlanguage terminologists are willing to take their rightful place in the new millennium, they must not only take cognisance of this trend but also be ready to implement the new technology. In this article it is advocated that the best way to do the latter two at this stage, is to opt for computationally straightforward alternatives (i.e. use 'raw corpora') and to make use of widely available software tools (e.g. WordSmith Tools). The main aim is therefore to discover whether or not the semiautomatic extraction of terminology from untagged and unmarked running text by means of basic corpus query software is feasible for the African languages. In order to answer this question a fullblown case study revolving around Northern Sotho linguistic texts is discussed in great detail. The computational results are compared throughout with the outcome of a manual excerption, and vice versa. Attention is given to the concepts 'recall' and 'precision'; different approaches are suggested for the treatment of single-word terms versus multi-word terms; and the various findings are summarised in a Linguistics Terminology lexicon presented as an Appendix.

show abstract

Corpus-based linguistic investigation for the South African Bantu languages: a Northern Sotho case study

Taljard

2006

South African Journal of African Languages

View full text Add to dashboard Cite

On the development of a tagset for Northern Sotho with special reference to the issue of standardisation

et al. 2008

View full text Add to dashboard Cite

Working with corpora in the South African Bantu languages has up till now been limited to the utilisation of raw corpora. Such corpora, however, have limited functionality. Thus the next logical step in any NLP application is the development of software for automatic tagging of electronic texts. The development of a tagset is one of the first steps in corpus annotation. The authors of this article argue that the design of a tagset cannot be isolated from the purpose of the tagset, or from the place of the tagset and its design within the bigger picture of the architecture of corpus annotation. Usage-related aspects therefore feature prominently in the design of the tagset for Northern Sotho. It is explained why this proposed tagset is biased towards human readability, rather than machine readability; this choice of a stochastic tagger is motivated, and the relationship between tokenising, tagging, morphological analysis and parsing is discussed. In order to account at least to some extent for the morphological complexity of Northern Sotho at the tagging level, a multilevel annotation is opted for: the first level comprising obligatory information and the second optional and recommended information. Finally, aspects of standardisation are considered against the background of reuse, of sharing of resources, and of possible adaptation for use by other disjunctively written South African Bantu languages. It is not the aim of this article to evaluate the results of any tagging procedure using the proposed tagset. It only describes the design and motivates the choices made with regard to the tagset design. However, an evaluation is in process and results will be published in the near future (cf. Faaß et al., s.a.)

show abstract

Compiling a Corpus-based Dictionary Grammar: An Example for Northern Sotho

Schryver

Taljard

2011

Lex

View full text Add to dashboard Cite

Abstract:In this article it is shown how a corpus-based dictionary grammar may be compiled -that is, a mini-grammar fully based on corpus data and specifically written for use in and integrated with a dictionary. Such an effort is, to the best of our knowledge, a world's first. We exemplify our approach for a Northern Sotho mini-grammar, to be included into a Northern Sotho-English dictionary. Keywords: LEXICOGRAPHY, DICTIONARY, CORPUS, FREQUENCY, MIDDLE MATTER, DICTIONARY GRAMMAR, NORTHERN SOTHO (SESOTHO SA LEBOA)Samenvatting: Een corpusgebaseerde woordenboekgrammatica samenstellen: een voorbeeld voor Noord-Sotho. In dit artikel wordt aangetoond hoe een corpusgebaseerde woordenboekgrammatica kan samengesteld worden -dit is, een minigrammatica die al z'n gegevens rechtstreeks uit een corpus haalt en die speciaal geschreven werd om in een woordenboek gebruikt te worden, en er ook volledig mee geïntegreerd is. Zo'n poging is, voor zover ons bekend, een wereldprimeur. We illustreren onze aanpak voor een minigrammatica van het NoordSotho, bedoeld om gebruikt te worden in een Noord-Sotho-Engels woordenboek. Sleutelwoorden: LEXICOGRAFIE, WOORDENBOEK, CORPUS, FREQUENTIE, MID-DENWERK, WOORDENBOEKGRAMMATICA, NOORD-SOTHO Using corpora beyond a dictionary's central section(s)It is now widely accepted that the use of electronic corpora has become indispensable in modern dictionary making, and this on a variety of levels. But just on how many levels? The macrostructural and microstructural levels immediately spring to mind, and most attention in the scientific literature has indeed also gone to aspects revolving around the corpus-based selection of lemma signs on the one hand, and the corpus-based construction of articles attached to those lemma signs on the other. Any self-respecting dictionary, however, contains much more than 'just' the central text. Good dictionaries also comprise extra matter, invariably distributed across front, middle and back matter sections. If one is serious about corpus-based lexicography, then the extra matter should also be rooted in corpus data. One can come a long way by making sure there is a one-to-one correlation between the central (corpus-based) section(s) and the extra matter (cf. below), but during practical dictionary making this quickly proves not to be sufficient. In this article the focus will be on the creation of a corpus-based dictionary grammar, exemplified for Northern Sotho. The core principles of corpus-based lexicography will be briefly reviewed in order to set the stage, but that review is merely incidental and the reader is referred to Sinclair (1987) and Corréard (2002) for what remain to this day the best collections on the topic. Corpus-based lexicography in a nutshellIn corpus-based lexicography, the main arbiter during the creation of the (initial) macrostructure is the list of frequencies attached to the lemmatised list of inclusion candidates. Clearly, there are as many lemmatisation policies as there are dictionary teams compiling dictionaries, but it remains comm...

show abstract

Management and Internal Standardization of Chemistry Terminology: A Northern Sotho Case Study

Taljard

Nchabeleng

2012

Lex

View full text Add to dashboard Cite

Abstract:One of the many implications of the process of language democratization which started post-1994 in South Africa is the empowerment of the previously marginalized South African Bantu languages to become languages of higher functions, i.e. languages of learning and teaching, and also of scientific discourse. This in turn implies the development, consolidation and especially standardization of terminology for each of these languages, and the compilation of LSP dictionaries. This article describes the terminological processing of a technical source text prior to translation, which formed part of the compilation of a Quadrilingual Explanatory Dictionary of Chemistry. It reports on the model of terminology management that was utilized and explores strategies for the internal standardization of terms in the absence of readily available, standardized chemistry terminology. Keywords: TERMINOLOGY MANAGEMENT, TERMINOLOGY STANDARDIZATION, NORTHERN SOTHO CHEMISTRY TERMINOLOGY, USERS' PREFERENCES, TERM EXTRAC-TION, TERM EQUIVALENCE, TECHNICAL TRANSLATION Opsomming: Bestuur en interne standaardisering van chemieterminologie:'n Noord-Sotho gevallestudie. Een van die talle implikasies van die proses van taaldemokratisering wat na 1994 in Suid-Afrika plaasgevind het, is die bemagtiging van die voorheen benadeelde Suid-Afrikaanse Bantoetale om ook tale van hoër funksies te word, dit wil sê tale van onderrig en leer, en ook tale van wetenskaplike diskoers. Dit impliseer die ontwikkeling, konsolidasie en veral standaardisering van terminologie vir elkeen van hierdie tale, asook die saamstel van vakwoordeboeke. Hierdie artikel beskryf die terminologiese prosessering van 'n tegniese teks voor die vertaling daarvan. Die vertaling vorm deel van die samestelling van 'n Viertalige Verklarende Chemiewoordeboek. Die artikel lewer verslag oor die model van terminologiebestuur wat gebruik is *

show abstract

Corpus-based language teaching: An African language perspective

Taljard

2012

Southern African Linguistics and Applied Language Studies

View full text Add to dashboard Cite

Studies on corpus-based language teaching are notably absent within the South African educational context; more so with regard to the teaching of African languages. This article explores the possibilities offered by the availability of an electronic corpus to enhance language teaching, and more specifically, the teaching of Northern Sotho as a second additional language at first year university level to first time learners of the language. Particular attention is paid to corpus-based selection and sequencing of learning material, an activity that has hitherto depended on anecdotal evidence and the intuition of the language teacher. A critical evaluation of existing pedagogical material for Northern Sotho reveals that although excellent sources of reference, these works are inadequate for the purpose of teaching Northern Sotho to first time learners. It is indicated that information gleaned from a corpus provides the language teacher with guidance on both micro and macro level with regard to selection and sequencing of learning content.

show abstract

The Sepedi Helper Writing Assistant: A User Study

Prinsloo

Taljard

2019

Language Matters

View full text Add to dashboard Cite

Cultural adaptation and Northern Sotho translation of the Modified Checklist for Autism in Toddlers

Vorster

Kritzinger

Lekganyane

et al. 2022

SAJCE

View full text Add to dashboard Cite

Background: In recent reviews of autism spectrum disorder screening tools, the Modified Checklist for Autism in Toddlers, Revised with Follow-Up (M-CHAT-R/FTM) has been recommended for use in lower middle-income countries to promote earlier identification.Aim: The study aim was to culturally adapt and translate the M-CHAT-R/FTM into Northern Sotho, a South African language.Setting: An expert panel was purposively selected for the review and focus group discussion that was conducted within an academic context.Method: The source translation (English) was reviewed by bilingual Northern Sotho-English speech-language therapists who made recommendations for cultural adaptation. A double translation method was used, followed by a multidisciplinary expert panel discussion and a self-completed questionnaire.Results: Holistic review of test, additional remarks and grammar and phrasing were identified as the most prominent themes of the panel discussion, emphasising the equivalence of the target translation.Conclusion: A South African culturally adapted English version of the M-CHAT-R/FTM is now available along with the preliminary Northern Sotho version of the M-CHAT-R/FTM. The two versions can now be confirmed by gathering empirical evidence of reliability and validity.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Elsabé Taljard

Semi-automatic Term Extraction for the African Languages, with Special Reference to Northern Sotho *

Corpus-based linguistic investigation for the South African Bantu languages: a Northern Sotho case study

On the development of a tagset for Northern Sotho with special reference to the issue of standardisation

Compiling a Corpus-based Dictionary Grammar: An Example for Northern Sotho

Management and Internal Standardization of Chemistry Terminology: A Northern Sotho Case Study

Corpus-based language teaching: An African language perspective

The Sepedi Helper Writing Assistant: A User Study

Cultural adaptation and Northern Sotho translation of the Modified Checklist for Autism in Toddlers

Contact Info

Product

Resources

About