The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a realworld setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.
This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.
We describe an annotation scheme and a tool developed for creating linguistically annotated corpora for non-configurational languages. Since the requirements for such a formalism differ from those posited for configurational languages, several features have been added, influencing the architecture of the scheme. The resulting scheme reflects a stratificational notion of language, and makes only minimal assumptions about the interrelation of the particular representational strata.
Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for cross-linguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
In recent years translation quality evaluation has emerged as a major, and at times contentious, topic. The industry view on quality is highly fragmented, in part because different kinds of translation projects require very different evaluation methods. In addition, human and machine translation (MT) quality evaluation methods have been fundamentally different in kind, preventing comparison of the two. The lack of clarity results in uncertainty about whether or not a translation meets requesters' or end users' needs and leaves providers unclear about what requesters need and want. In response the EU-funded QTLaunchPad project has developed the Multidimensional Quality Metrics (MQM) framework, an open an extensible system for declaring and describing translation quality metrics using a shared vocabulary of "issue types"Keywords: Machine Translation, Translation quality, Metrics, Issues types. RESUM (MQM: Un marc per declarar i descriure mètriques de qualitat de la traducció)En els últims anys l'avaluació de la qualitat de la traducció s'ha convertit en un tema rellevant i a la vegada que, de vegades, polèmic. La perspectiva de la indústria sobre la qualitat està altament fragmentada, en part perquè diferents tipus de projectes de traducció requereixen mètodes molt diferents d'avaluació. A més, els mètodes d'avaluació de la qualitat de les traduccions humanes i de les traduccions elaborades amb traducció automàtica (TA) són d'índole diferent. La manca de claredat provoca incertesa sobre si una traducció compleix amb les necessitats del seu promotor o l'usuari final, i deixa als proveïdors amb dubtes sobre el que els clients volen i necessiten. Com a resposta a aquest fet, el projecte QTLaunchPad, finançat per la Unió Europea, ha desenvolupat el marc denominat Multidimensional Quality Metrics (MQM), un sistema obert i ampliable per declarar i descriure les mètriques sobre qualitat en traducció utilitzant un vocabulari compartit de "classes de problemes". RESUMEN (MQM: Un marco para declarar y describir métricas de calidad de la traducción)En los últimos años la evaluación de la calidad de la traducción se ha convertido en un tema relevante a la par que, en ocasiones, polémico. La perspectiva de la industria sobre la calidad está altamente fragmentada, en parte porque diferentes tipos de proyectos de traducción requieren métodos muy diferentes de evaluación. Además, los métodos de evaluación de la calidad de las traducciones humanas y de las traducciones elaboradas con traducción automática (TA) son de índole diferente. La falta de claridad provoca incerteza sobre si una traducción cumple con las necesidades de su promotor o su usuario final, y deja a los proveedores con dudas sobre lo que los clientes quieren y necesitan. Como respuesta a este hecho, el proyecto QTLaunchPad, financiado por la Unión Europea, ha desarrollado el marco denominado Multidimensional Quality Metrics (MQM), un sistema abierto y ampliable para declarar y describir las métricas sobre calidad en traducción utilizando un vocabulario co...
We present an implemented approach for domain-restricted question answering from structured knowledge sources, based on robust semantic analysis in a hybrid NLP system architecture. We perform question interpretation and answer extraction in an architecture that builds on a lexical-conceptual structure for question interpretation, which is interfaced with domain-specific concepts and properties in a structured knowledge base. Question interpretation involves a limited amount of domain-specific inferences, and accounts for higher-level quantificational questions. Question interpretation and answer extraction are modular components that interact in clearly defined ways. We derive so-called proto queries from the linguistic representations, which provide partial constraints for answer extraction from the underlying knowledge sources. The search queries we construct from proto queries effectively compute minimal spanning trees from the underlying knowledge sources. Our approach naturally extends to multilingual question answering, and has been developed as a prototype system for two application domains: the domain of Nobel prize winners, and the domain of Language Technology, on the basis of the large ontology underlying the information portal LT WORLD.
Categorial unification grammars (CUGs) embody the essential properties of both unification and categorial grammar formalisms. Their efficient and uniform way of encoding linguistic knowledge in well-understood and widely used representations makes them attractive for computational applications and for linguistic research. In this paper, the basic concepts of CUGs and simple examples of their application will be presented. It will be argued that the strategies and potentials of CUGs justify their further exploration in the wider context of research on unification grammars. Approaches to selected linguistic phenomena such as long-distance dependencies, adjuncts, word order, and extraposition are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.