2004
DOI: 10.1007/s11168-004-7431-3
|View full text |Cite
|
Sign up to set email alerts
|

TIGER: Linguistic Interpretation of a German Corpus

Abstract: This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
147
0
7

Year Published

2004
2004
2022
2022

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 159 publications
(154 citation statements)
references
References 11 publications
0
147
0
7
Order By: Relevance
“…This is significantly higher as the ambiguity in German newspaper texts, e.g., 1 for the TIGER corpus containing 890,000 tokens. In order to provide a sufficiently large training data amount, we combine WebTrain with the TIGER treebank [19] newspaper text corpus. We call that joint-domain training.…”
Section: Resultsmentioning
confidence: 99%
“…This is significantly higher as the ambiguity in German newspaper texts, e.g., 1 for the TIGER corpus containing 890,000 tokens. In order to provide a sufficiently large training data amount, we combine WebTrain with the TIGER treebank [19] newspaper text corpus. We call that joint-domain training.…”
Section: Resultsmentioning
confidence: 99%
“…For training and development, the TiGer syntactic treebank 2.2 (Brants et al 2004) was utilized, specifically the 5k train and dev set from the SPMRL 2014 shared task data version (Seddah et al 2014). Importantly, punctuation and other unattached elements are attached to the tree following Maier et al (2012), resolving crossing-branches (for a full description of the data preprocessing, see Seddah et al (2013b)).…”
Section: Methodsmentioning
confidence: 99%
“…An example of a less widely-used but still well-known parsed corpus is the TiGer corpus [11], of which the current version contains approximately 900 K words/50 K sentences of German newspaper text. TiGer is freely available as plain text for noncommercial, non-profit research purposes and in XML format with phrase-structure and dependency-structure representations.…”
Section: Syntactic Parse Treesmentioning
confidence: 99%