We sketch and illustrate an approach to machine translation that exploits the potential of simultaneous correspondences between separate levels of linguistic representation, as formalized in the LFG notion of codescriptions. The approach is illustrated with examples from English, German and French where the source and the target language sentence show noteworthy differences in linguistic analysis.
We present an implemented compilation algorithm that translates HPSG into lexicalized feature-based TAG, relating concepts of the two theories. While HPSG has a more elaborated principle-based theory of possible phrase structures, TAG provides the means to represent lexicalized structures more explicitly. Our objectives are met by giving clear definitions that determine the projection of structures from the lexicon, and identify "maximal" projections, auxiliary trees and foot nodes.
In this pa per we describe an effort to construct a catalogue of syntacti c da ta, exemplifying the major syntactic patterns of German. The purpose of the corpus is to support t he diagnosis of errors in the syntact ic components of natural language processing (NLP) systems. Two secondary aims are the evaluation of NLP systems components and the support of theoretical and empirical work on German syntax. Th e data consist of artificially and systematically constructed expressions, including also negative (ungrammatical) examples. The data are organized into a relational data base and annotated with some basic information about the phenomena illustrated and the internal structure of the sample sentences . The organization of the data supports selected systematic testing of specifi c areas of syntax, but a lso serves the purpose of a linguistic data base. Th e paper first gives some general motivation for the necessity of syntactic precision in some areas of NLP and discusses the potential contri bution of a syntactic data base to the field of component evaluation . The second part of the paper describes the set up and control methods applied in the construct ion of t he sentence sui te and annota tions to the examples. We illustrate the approach with the examp le of verbal government. T he section also contains a descri ption of the abstract data model, the design of the data base and t he query language used to access the data. The final sections compare our work to existing approaches and sketch some future extensions.We invite other research groups to participate in our effort, so that the diagnostics tool can eventually become pub lic domain .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.