Proceedings of the Third Italian Conference on Computational Linguistics CLiC-it 2016 2016
DOI: 10.4000/books.aaccademia.1782
|View full text |Cite
|
Sign up to set email alerts
|

The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 1 publication
0
5
0
Order By: Relevance
“…A number of resources have been produced for plurilingualism areas of Italy where South Tyrolean is spoken, such as a multilingual corpus of computer-mediated communication (Frey et al, 2016), and a longitudinal trilingual corpus of young learners (Glaznieks et al, 2022). Preliminary efforts such as a morphosyntactic specification for Resian (Erjavec, 2017), a lexical database for Sardinian, Gallurese and Sassarese (Angioni et al, 2018), and a tagset for Cimbrian varieties (Agosti et al, 2012) have also been carried out.…”
Section: Nlp For Specific Varieties Of Italymentioning
confidence: 99%
“…A number of resources have been produced for plurilingualism areas of Italy where South Tyrolean is spoken, such as a multilingual corpus of computer-mediated communication (Frey et al, 2016), and a longitudinal trilingual corpus of young learners (Glaznieks et al, 2022). Preliminary efforts such as a morphosyntactic specification for Resian (Erjavec, 2017), a lexical database for Sardinian, Gallurese and Sassarese (Angioni et al, 2018), and a tagset for Cimbrian varieties (Agosti et al, 2012) have also been carried out.…”
Section: Nlp For Specific Varieties Of Italymentioning
confidence: 99%
“…Four freely available corpora of German CMC were used. First, the DiDi Corpus (Frey et al, 2016), consisting of Facebook status updates, comments, and chat messages of 136 different users. As the corpus contains different languages, only the German part was used, which amounts to 373,383 tokens from 130 authors.…”
Section: Datamentioning
confidence: 99%
“…The Institute for Applied Linguistics (IAL) at Eurac Research is currently investigating how it can move towards such a setup for more reproducibility in research as outlined in the previous section. One of the first corpora that was transformed into such a strictly versioned environment is the DiDi corpus (Frey et al, 2016). The corpus is available under an academic non-commercial (ACA-NC) license from an onpremise GitLab installation 15 .…”
Section: Case Study: the Didi Corpusmentioning
confidence: 99%