2010
DOI: 10.4314/lex.v17i1.51535
|View full text |Cite
|
Sign up to set email alerts
|

Dictionary Writing System (DWS) + Corpus Query Package (CQP): The Case of "TshwaneLex"

Abstract: In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2010
2010
2012
2012

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(14 citation statements)
references
References 2 publications
(2 reference statements)
0
14
0
Order By: Relevance
“…Using a corpus is not a once-off process, but rather a continuous one. This process is greatly facilitated by software where the dictionary writing system (DWS) and the corpus query package (CQP) are seamlessly integrated (De Schryver and De Pauw 2007).…”
Section: Corpus-based Lexicography In a Nutshellmentioning
confidence: 99%
See 1 more Smart Citation
“…Using a corpus is not a once-off process, but rather a continuous one. This process is greatly facilitated by software where the dictionary writing system (DWS) and the corpus query package (CQP) are seamlessly integrated (De Schryver and De Pauw 2007).…”
Section: Corpus-based Lexicography In a Nutshellmentioning
confidence: 99%
“…Using a corpus is not a once-off process, but rather a continuous one. This process is greatly facilitated by software where the dictionary writing system (DWS) and the corpus query package (CQP) are seamlessly integrated (De Schryver and De Pauw 2007).As noted in Section 1, in addition to a dictionary's macrostructure(s) and microstructure(s), most dictionaries also contain extra matter material. Numerous examples exist of dictionaries where the extra matter was clearly written in Compiling a Corpus-based Dictionary Grammar: An Example for Northern Sotho 39 isolation, and if not to that extreme, at least not with a full integration in mind.…”
mentioning
confidence: 99%
“…the stem approach for verbs and the word approach for nouns (e.g. Kriel 1983, de Schryver 2007. For a detailed discussion of the difference between stem lemmatization and word lemmatization and the implications they have on user-friendliness, the reader is referred to Prinsloo (2009).…”
Section: Lemmatization Strategiesmentioning
confidence: 99%
“…So there was never an intention to make it commercially viable. However, for the sake of objectivity a superficial comparison to current stateof-the-art expectations of DWSs as set out by, for instance, Joffe and De Schryver (2004) and by De Schryver and De Pauw (2007) may be opportune. It was stated before that the software does not provide for a Corpus Query Package as at the time of its creation electronic text corpora hardly existed in Khoekhoegowab. Most outstanding is the absolute separation of the pre-dictionary database stage and the dictionary compilation phase.…”
Section: Limitations Of the Ndp5 Dictionary Writing Systemmentioning
confidence: 99%
See 1 more Smart Citation