This paper is a contribution to the discussion on compiling computational lexical resources from conventional dictionaries. It describes the theoretical as well as practical problems that are encountered when reusing a conventional dictionary for compiling a lexical-semantic resource in terms of a wordnet. More specifically, it describes the methodological issues of compiling a wordnet for Danish, DanNet, from a monolingual basis, and not-as is often seen-by applying the translational expansion method with Princeton WordNet as the English source. Thus, we apply as our basis a large, corpus-based printed dictionary of modern Danish. Using this approach, we discuss the issues of readjusting inconsistent and/or underspecified hyponymy hierarchies taken from the conventional dictionary, sense distinctions as opposed to the synonym sets of wordnets, generating semantic wordnet relations on the basis of sense definitions, and finally, supplementing missing or implicit information.
This paper deals with the SIMPLE-DK lexicon, a computational lexicon for Danish developed at the Centre for Language Technology in Copenhagen within the European Union project SIMPLE. The general SIMPLE model, on which the Danish lexicon is based, is presented, and the way in which several specific aspects of Danish, such as nominal compounds and time expressions, are accommodated in this model is then described. Phrasal verbs – in particular phrasal motion verbs – are shown to be a challenging phenomenon since they are difficult to place in the SIMPLE event ontology, and pose problems regarding the interpretation of the directional particle they combine with. The encoding strategy that is proposed here accounts for compositional and non-compositional types of phrasal verb, and captures the relation between act-denoting and transition-denoting senses of the same verb in terms of regular polysemy. The final part of the paper deals with the exploitation of SIMPLE-DK as an ontological and lexical source in the Danish project on content-based querying OntoQuery. In the OntoQuery ontology, the structured concepts in SIMPLE-DK are combined with nutrition concepts, and the resulting ontology is used for matching evaluation. It is also discussed how selectional restrictions and qualia roles from SIMPLE-DK can be included in a conceptual grammar to be used for query and text analysis.
This paper describes the language technology methods developed in the Danish research project VID to extract from Danish text material relevant information for the population of knowledge organization systems (KOS) within specific corporate domains. The results achieved by applying these methods to a prototype search engine tuned to the patent and trademark domain indicate that the use of human language technology can support the construction of a linguistically based KOS and that linguistic information in search improves recall substantially without harming precision (near 90%). Finally, we describe two research experiments where (1) linguistic analysis of Danish compounds is exploited to improve search strategies on these and (2) linguistic knowledge is used to model corporate knowledge into a language-based ontology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.