Contributive resources, such as Wikipedia, have proved to be valuable to Natural Language Processing or multilingual Information Retrieval applications. This work focusses on Wiktionary, the dictionary part of the resources sponsored by the Wikimedia foundation. In this article, we present our extraction of multilingual lexical data from Wiktionary data and to provide it to the community as a Multilingual Lexical Linked Open Data (MLLOD). This lexical resource is structured using the LEMON Model.
After 3 years of specifying the UNL (Universal Networking Language) language and prototyping deconverters I from more than 12 languages and enconverters for about 4, the UNL project has opened to the community by publishing the specifcations (v2.0) of the UNL language, intended to encode the meaning of NL utterances as semantic hypergraphs and to be used as a "pivot" representation in multilingual information and communication systems. A UNL document is an html document with special tags to delimit the utterances and their rendering in UNL and in all natural languages currently handled. UNL can be viewed as the future "html of the linguistic content". It is only an interface format, leading as well to the reuse of existing NLP components as to the development of original tools in a variety of possible applications, from automatic rough enconversion for information retrieval and information gathering translation to partially interactive enconversion or deconversion for higher quality. We illustrate these points by describing an UNL-French deconverter organized as a specific "localizer" followed by a classical MT transfer and an existing generator.
We propose a lexical organisation for multilingual lexical databases (MLDB). This organisation is based on acceptions (word-senses). We detail this lexical organisation and show a mock-up built to experiment with it. We also present our current work in defining and prototyping a specialised system for the management of acception-based MLDB.
This article introduces the topic of ''Multilingual language resources and interoperability''. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
The motivation of the Papillon project is to encourage the development of freely accessible Multilingual Lexical Resources by way of online collaborative work on the Internet. For this, we developed a generic community website originally dedicated to the diffusion and the development of a particular acception based multilingual lexical database.The generic aspect of our platform allows its use for the development of other lexical databases. Adapting it to a new lexical database is a matter of description of its structures and interfaces by way of XML files. In this paper, we show how we already adapted it to other very different lexical databases. We also show what future developments should be done in order to gather several lexical databases developers in a common network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.