A hybrid system is described which combines the strength of manual rulewriting and statistical learning, obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing partial disambiguation with recall close to 100% is applied first, and a trigram HMM tagger runs on its results. An experiment in Czech tagging has been performed with encouraging results.
Word-level morphosyntactic descriptions, such as "Ncmsn" designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few attempts to arrive at a proposal that would be harmonised across the languages. Standardisation adds to the interchange potential of the resources, making it easier to develop multilingual applications or to evaluate language technology tools across several languages. The process of the harmonisation of morphosyntactic categories, esp. for morphologically rich Slavic languages is also interesting from a language-typological perspective. The EU MULTEXT-East project developed corpora, lexica and tools for seven languages, with the focus being on morphosyntactic data, including formal, EAGLES-based specifications for lexical morphosyntactic descriptions. The specifications were later extended, so that they currently cover nine languages, five from the Slavic family: Bulgarian, Croatian, Czech, Serbian and Slovene. The paper presents these morphosyntactic specifications, giving their background and structure, including the encoding of the tables as TEI feature structures. The five Slavic language specifications are discussed in more depth.
No abstract
A detailed morphological description of word forms in any language is a necessary condition for a successful automatic processing of linguistic data. The paper focuses on a new description of morphological categories, mainly on the subcategorization of parts of speech in Czech within the NovaMorf project. NovaMorf focuses on the description of morphological properties of Czech word forms in a more compact and consistent way and with a higher explicative power than approaches used so far. It also aims at the unification of diverse approaches to morphological annotation of Czech. NovaMorf approach will be reflected in a new morphological dictionary to be exploited for a new automatic morphological analysis (and disambiguation) of corpora of contemporary Czech.
A new approach to the formal description of the semantics of a natural language within the Prague group's functional generative description of language is presented. Our approach represents the semantics and the process of the speaker's formulation of a sentence by a pushdown store generator framework comprising three principal features: dependency relations, coordination (apposition) and the topic-focus articulation. The interplay of these semantic components during the generation of a sentence is shown along with the possibility of an easy refinement of the framework in case special linguistic phenomena be described.Brought to you by | University of Arizona Authenticated Download Date | 7/17/15 2:31 AM We do not suppose that a real speaker produces a sentence by means of such a sequential procedure and that he clearly distinguishes the representations of the sentence on the individual levels just described. In the formulation of a sentence by the speaker these levels seem rather to be interlaced and running parallel. Brought to you by | University of Arizona Authenticated Download Date | 7/17/15 2:31 AM Brought to you by | University of Arizona Authenticated Download Date | 7/17/15 2:31 AM
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.