Europe is a multilingual society, in which dozens of languages are spoken. The only op tion to enable and to benefit from multilingual ism is through Language Technologies (LT), i. e., Natural Language Processing and Speech Technologies. We describe the European Lan guage Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella plat form for the European LT landscape, includ ing research and industry, enabling all stake holders to upload, share and distribute their ser vices, products and resources. At the end of our EU project, which will establish a legal en tity in 2022, the ELG will provide access to ap prox. 1300 services for all European languages as well as thousands of data sets.
The authors address the legal issues relating to the creation and use of language models. The article begins with an explanation of the development of language technologies. The authors analyse the technological process within the framework copyright, related rights and personal data protection law. The authors also cover commercial use of language models. The authors' main argument is that legal restrictions applicable to language data containing copyrighted material and personal data usually do not apply to language models. Language models are generally not considered derivative works. Due to a wide range of language models, this position is not absolute.
Language resources are very often valuable assets which are offered to the public under the terms of licenses that determine which uses are allowed and under which circumstances. These licenses have been typically published as natural language texts whose specific contents cannot be easily processed by a computer. This paper proposes a structured representation for the most commonly used licenses for language resources, reusing existing vocabularies and extending the Open Digital Rights Language core model. Examples and guidelines to use the 'Rights Information for Language Resources' vocabulary are given.
This paper reports on completed work carried out in the framework of the INTERA project, and specifically, on the production of multilingual resources (LRs) for eContent purposes. The paper presents the methodology adopted for the development of the corpus (acquisition and processing of the textual data), discusses the divergence of the initial assumptions from the actual situation met during this procedure, and concludes with a summarization of the problems attested which undermine the viability of multilingual parallel corpora construction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.