Digitale Infrastrukturen Für Die Germanistische Forschung 2018
DOI: 10.1515/9783110538663-011
|View full text |Cite
|
Sign up to set email alerts
|

10. Das Deutsche Textarchiv als Forschungsplattform für historische Daten in CLARIN

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…The historical journal Die Grenzboten was the first full text transferred from the SuUB to CLARIN (Geyken et al, 2018). Die Grenzboten is a long running serial publication which can be classified as a literary journal that also covers politics and arts.…”
Section: Digitizing University Libraries As Full-text Providers For Clarinmentioning
confidence: 99%
“…The historical journal Die Grenzboten was the first full text transferred from the SuUB to CLARIN (Geyken et al, 2018). Die Grenzboten is a long running serial publication which can be classified as a literary journal that also covers politics and arts.…”
Section: Digitizing University Libraries As Full-text Providers For Clarinmentioning
confidence: 99%
“…For general English (GE), we use the Corpus of Late Modern English Texts (CLMET; Diller et al, 2011), spanning 1710−1920 with approximately 40 million tokens from several genres (e.g., narrative, drama). For German, texts from 1650−1900 are retrieved from the scientific (SG) and general language (GG) subcorpora of Deutsches Textarchiv (DTA, Geyken et al, 2018) respectively. Scientific German is represented with approximately 80 million tokens, general German with approximately 60 million tokens including non-fictional as well as fictional prose texts.…”
Section: Datamentioning
confidence: 99%
“…The four super-level categories for written language are taken from the DTA (Deutsches Textarchiv) (Geyken et al, 2011): Wissenschaft (science), Belletristik (literature), Zeitung (press) and Gebrauchstext (operative text). We add a Gesprochen (spoken) variety to also test our model on a different medium of communication.…”
Section: Corpus Designmentioning
confidence: 99%