Proceedings of the 14th Conference on Computational Linguistics - 1992
DOI: 10.3115/992424.992468
|View full text |Cite
|
Sign up to set email alerts
|

Broad coverage automatic morphological segmentation of German words

Abstract: A system for the automatic segmentation of German words into morphs was developed. The main linguistic knowledge sources used by the system are a word syntax and a morph dictionary. The syntax is written in the formalism of right linear regular grammars and comprises approximately 1,400 rules describing the set of those sequences of morph classes which underlie syntactically well formed words. The morph dictionary contains almost 11,000 morphs. Each morph is assigned to up to 6 morph classes.-Statistical evalu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
4
0

Year Published

1993
1993
2018
2018

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 2 publications
1
4
0
Order By: Relevance
“…The latest version of annotated data when the article is prepared: Burmese: http://www2.nict.go.jp/astrec-att/member/ mutiyama/ALT/my-nova-170405.zip Khmer: http://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/km-nova-180803. zip 11. The modified short-and long-tags show similar tendency as those of the basic tags shown inFigure 7.…”
supporting
confidence: 60%
See 2 more Smart Citations
“…The latest version of annotated data when the article is prepared: Burmese: http://www2.nict.go.jp/astrec-att/member/ mutiyama/ALT/my-nova-170405.zip Khmer: http://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/km-nova-180803. zip 11. The modified short-and long-tags show similar tendency as those of the basic tags shown inFigure 7.…”
supporting
confidence: 60%
“…Compared with the non-Zipfian series of short-tag, it suggests that the bracketing annotation covers plenty of phenomena in a heavy-tailed distribution, rather than a simple classification of tokens. 11 Detailed distributions of tags and patterns are listed in Tables 2, 3, and 4. In Table 2 of short tags, it is obvious that n, v, and o tags nearly cover 90% of the tokens, which is in accordance with the…”
Section: Statistics and Examples On Annotated Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Teahan et al (2000) state that interpreting a text as a sequence of words is beneficial for some information retrieval and storage tasks: for example, full-text searches, word-based compression, and key-phrase extraction. According to Guo (1997), words and tokens are the primary building blocks in almost all linguistic theories and language-processing systems, including Japanese (Kobayasi, Tokumaga, and Tanaka 1994), Korean (Yun, Lee, and Rim 1995), German (Pachunke et al 1992), and English (Garside, Leech, and Sampson 1987), in various media, such as continuous speech and cursive handwriting, and in numerous applications, such as translation, recognition, indexing, and proofreading. The identification of words in natural language is nontrivial since, as observed by Chao (1968), linguistic words often represent a different set than do sociological words.…”
Section: Introductionmentioning
confidence: 99%
“…Currently, word tokenization and segmentation problems exist in almost all natural languages such as Chinese (Chen and Liu 1992 ;Bai, 1995), Japanese (Yosiyuki, Takenobu and Hozumi 1992), Korean (Yun, Lee and Rim 1995), German (Pachunke, Mertineit, Wothke and Schmidt 1992) and English (Garside, Leech and Sampson 1987), in diverse media forms such as continuous speech recognition and handwriting recognition, and in numerous applications such as translation, recognition, indexing and proof-reading. Depending on the resources applied, word tokenization and segmentation solutions can be broadly categorized as either orthography-oriented or dictionary-based.…”
Section: Introductionmentioning
confidence: 99%