Europe is a multilingual society, in which dozens of languages are spoken. The only op tion to enable and to benefit from multilingual ism is through Language Technologies (LT), i. e., Natural Language Processing and Speech Technologies. We describe the European Lan guage Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella plat form for the European LT landscape, includ ing research and industry, enabling all stake holders to upload, share and distribute their ser vices, products and resources. At the end of our EU project, which will establish a legal en tity in 2022, the ELG will provide access to ap prox. 1300 services for all European languages as well as thousands of data sets.
This chapter provides an overview of what is available in ELG in terms of datasets, corpora and other language resources (LRs) and how this has been achieved. We look at the procedures and steps that have been followed to complete the full resource ingestion cycle, which goes from repository and LR identification to metadata description and ingestion. We explain the approaches, priorities and methodology. The chapter also outlines the repositories that have been integrated into ELG, discussing the different procedures followed (metadata conversion, extraction, and completion, as well as harvesting) and the reasons behind these choices. Furthermore, the ELG catalogue content is described, with details on key elements and features as well as accomplishments. The last two sections are devoted to the crucial legal issues behind such a complex platform and its data management plan, respectively.
To promote quality control of its language resources the European Language Resources Association (ELRA) installed a Validation Committee. This paper presents an overview of current activities of the Committee: validation of language resources, standardisation, bug reporting, patches of updates of language resources, and dissemination of results.
The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies (processing and generation services and tools, models, corpora, term lists, etc.), as well as related entities (e.g., organizations, projects, supporting documents, etc.). The schema powers the European Language Grid platform that aims to be the primary hub and marketplace for industry-relevant Language Technology in Europe. ELG-SHARE has been based on various metadata schemas, vocabularies, and ontologies, as well as related recommendations and guidelines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.