2010
DOI: 10.4314/lex.v12i1.51353
|View full text |Cite
|
Sign up to set email alerts
|

Semi-automatic Term Extraction for the African Languages, with Special Reference to Northern Sotho

Abstract: Worldwide, semi-automatically extracting terms from corpora is becoming the norm for the compilation of terminology lists, term banks or dictionaries for special purposes. If Africanlanguage terminologists are willing to take their rightful place in the new millennium, they must not only take cognisance of this trend but also be ready to implement the new technology. In this article it is advocated that the best way to do the latter two at this stage, is to opt for computationally straightforward alternatives … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…4; It is undeniable that frequency figures are very useful in corpus-based research and that they are normally interpreted in terms of the relevance/importance of the word in the field. Nevertheless, as conceded by Taljard and De Schryver (2002), reading through top-frequency words is obviously an unrefined procedure not necessarily implying significance. This is why, in this paper, the analysis of the image of top-ranked hotels provided by reviewers through EAs will initially be based on the classification of such EAs through a double perspective, based on the frequency but also on the saliency of these EAs.…”
Section: Frequency Saliency Quality and Value: A Proposed Classifimentioning
confidence: 99%
“…4; It is undeniable that frequency figures are very useful in corpus-based research and that they are normally interpreted in terms of the relevance/importance of the word in the field. Nevertheless, as conceded by Taljard and De Schryver (2002), reading through top-frequency words is obviously an unrefined procedure not necessarily implying significance. This is why, in this paper, the analysis of the image of top-ranked hotels provided by reviewers through EAs will initially be based on the classification of such EAs through a double perspective, based on the frequency but also on the saliency of these EAs.…”
Section: Frequency Saliency Quality and Value: A Proposed Classifimentioning
confidence: 99%
“…Furthermore, access to user-friendly and affordable software such as WordSmith Tools opens the door for terminologists to query and analyse these corpora automatically or at least semi-automatically 2 . It has already been illustrated by Taljard and De Schryver (2002) that it is indeed possible to extract terms semi-automatically from corpora based on subject-field texts, thus reducing (but of course not eliminating) the dependence of the terminologist on the co-operation of the subject-field specialist.…”
Section: Electronic Corpora and Terminology -An Overview Of The Currementioning
confidence: 99%
“…It has already been pointed out that it is indeed possible for South African terminologists to compile their own special field corpora, and, by following the methodology suggested by Taljard and De Schryver (2002), to semi-automatically extract terms from electronic texts. Should the terminologist now want to add definitions to the extracted terms, he/she is currently left with two options: (a) to formulate definitions with the help of a subject-field expert, or (b) to provide translational equivalents for the terms in English/Afrikaans, then search for definitions in either an LSP dictionary or existing term lists, and as a last step, translate the definitions from English/Afrikaans into the appropriate Bantu language.…”
Section: Generating Definitional Information: the Current South Africmentioning
confidence: 99%
See 1 more Smart Citation
“…However, semi-automatic term extraction does not succeed in extracting all terms from a source text. According to Taljard and De Schryver (2002), semi-automatic term extraction accounts for approximately 60% of terms in a running text. Therefore, computational extraction needs to be complemented by manual term excerption.…”
Section: Terminology Extractionmentioning
confidence: 99%