TAGH: A Complete Morphology for German Based on Weighted Finite State Automata

Geyken, Alexander; Hanneforth, Thomas

doi:10.1007/11780885_7

Cited by 21 publications

(11 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Wir greifen deshalb auf automatische Werkzeuge zurück. Die Bestimmung der Grundform der Wörter (Lemmatisierung) und die morphologische Zerlegung werden von der TAGH-Morphologie (Geyken & Hanneforth, 2006) (Jurish, 2003) wählt dann anhand des Kontexts die wahrscheinlichste Kategorie aus. Die Entscheidungen von moot basieren auf einem Hidden-Markov-Modell (Rabiner, 1989), das die Wahrscheinlichkeiten von drei aufeinander folgenden Wörtern (Wort-…”

Section: Datenerhebungunclassified

dlexDB – eine lexikalische Datenbank für die psychologische und linguistische Forschung

Heister¹,

Würzner²,

Bubenzer³

et al. 2011

Psychologische Rundschau

Self Cite

229

164

View full text Add to dashboard Cite

Zusammenfassung. Mit der lexikalischen Datenbank dlexDB stellen wir der psychologischen und linguistischen Forschung im World Wide Web online statistische Kennwerte für eine Vielzahl von verarbeitungsrelevanten Merkmalen von Wörtern zur Verfügung. Diese Kennwerte umfassen die durch CELEX ( Baayen, Piepenbrock und Gulikers, 1995 ) bekannten Variablen der Häufigkeiten von Wortformen und Lemmata in Texten geschriebener Sprache. Darüber hinaus berechnen wir eine Reihe neuer Kennwerte wie die Häufigkeiten von Silben, Morphemen, Zeichenfolgen und Mehrwortverbindungen sowie Wortähnlichkeitsmaße. Die Datengrundlage bildet das Kernkorpus des Digitalen Wörterbuchs der deutschen Sprache (DWDS) mit über 100 Millionen laufenden Wörtern. Wir illustrieren die Validität dieser Kennwerte mit neuen Ergebnissen zu ihrem Einfluss auf Fixationsdauern beim Lesen von Sätzen.

show abstract

Section: Datenerhebungunclassified

dlexDB – eine lexikalische Datenbank für die psychologische und linguistische Forschung

Heister¹,

Würzner²,

Bubenzer³

et al. 2011

Psychologische Rundschau

Self Cite

229

164

View full text Add to dashboard Cite

show abstract

“…Most of them are based on finite state machines. Gertwol (Haapalainen and Majorin, 1995), MORPH (Hanrieder, 1996), Morphy (Lezius, 1996;Lezius et al, 1998) and later SMOR (Schmid et al, 2004) and TAGH (Geyken and Hanneforth, 2006) generate morphological analyses for complex German words, yielding results for derivatives and compounds. All these analyses are flat word splittings and often include dozens of segmentation versions.…”

Section: Related Workmentioning

confidence: 99%

Augmenting a German Morphological Database by Data-Intense Methods

Steiner¹

2019

Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

This paper deals with the automatic enhancement of a new German morphological database. While there are some databases for flat word segmentation, this is the first available resource which can be directly used for deep parsing of German words. We combine the entries of this morphological database with the morphological tools SMOR and Moremorph and a context-based evaluation method which builds on a large Wikipedia corpus. We describe the state of the art and the essential characteristics of the database and the context method. The approach is tested on an inflight magazine of Lufthansa. We derive over 5,000 new instances of complex words. The coverage for the lemma types reaches up to over 99 percent. The precision of new found complex splits and monomorphemes is between 0.93 and 0.99.

show abstract

“…Historical text presents numerous challenges for contemporary natural language processing techniques. In particular, the absence of consistent orthographic conventions in historical text presents difficulties for any system requiring reference to a fixed lexicon accessed by orthographic form, such as document indexing systems (Sokirko, 2003;Cafarella and Cutting, 2004), part-of-speech taggers (DeRose, 1988;Brill, 1992;Schmid, 1994), simple word stemmers (Lovins, 1968;Porter, 1980), or more sophisticated morphological analyzers (Geyken and Hanneforth, 2006;Clematide, 2008).…”

Section: Introductionmentioning

confidence: 99%

Finding canonical forms for historical German text

Jurish¹

2008

Text Resources and Lexical Knowledge

View full text Add to dashboard Cite

Historical text presents numerous challenges for contemporary natural language processing techniques. In particular, the absence of consistent orthographic conventions in historical text presents difficulties for any system requiring reference to a static lexicon accessed by orthographic form. In this paper, we present three methods for associating unknown historical word forms with synchronically active canonical cognates and evaluate their performance on an information retrieval task over a manually annotated corpus of historical German verse.

show abstract

TAGH: A Complete Morphology for German Based on Weighted Finite State Automata

Cited by 21 publications

References 5 publications

dlexDB – eine lexikalische Datenbank für die psychologische und linguistische Forschung

dlexDB – eine lexikalische Datenbank für die psychologische und linguistische Forschung

Augmenting a German Morphological Database by Data-Intense Methods

Finding canonical forms for historical German text

Contact Info

Product

Resources

About