A TEI Schema for the Representation of Computer-mediated Communication

Beißwenger, Michael; Ermakova, Maria; Geyken, Alexander; Lemnitzer, Lothar; Störrer, Angelika

doi:10.4000/jtei.476

Cited by 14 publications

(20 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Several studies suggest these messages bear characteristics that resemble spoken rather than written language (Chun, 1994;Lamy & Hampel, 2007,). However, as Beißwenger et al, (2012) underline, one characteristic of synchronous text chat is that each message is posted as a block. Therefore, revisions to the message that are apparent to the other interlocutors cannot be made partway through the construction of the message.…”

Section: Introductionmentioning

confidence: 99%

Interactions between text chat and audio modalities for L2 communication and feedback in the synthetic worldSecond Life

Wigham

Chanier

2013

Computer Assisted Language Learning

View full text Add to dashboard Cite

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

Section: Introductionmentioning

confidence: 99%

Interactions between text chat and audio modalities for L2 communication and feedback in the synthetic worldSecond Life

Wigham

Chanier

2013

Computer Assisted Language Learning

View full text Add to dashboard Cite

show abstract

“…The domain of CMC is a recent and prominent example of an "area that the TEI has not yet envisioned". Several approaches to customise TEI P5 for the annotation of CMC corpora published in 2012, 2014 and 2016 (Beißwenger et al 2012, Chanier et al 2014, Margaretha & Lüngen 2014, Lüngen et al 2016 and discussed at TEI conferences and members' meetings have proven that TEI P5 provides a useful platform for the definition of representation schemas for CMC. Nevertheless, as long as a model for the representation of CMC is available "only" in form of TEI customisations, the official TEI standard is not up-to-date to cover this very prominent domain of discourse which, in the past few years, has become a subject of studies in a broad range of disciplines (within and beyond the humanities).…”

Section: Representing Cmc In Tei: Fundamental Decisions and Consideramentioning

confidence: 99%

“…The four schemas have not been developed independently; instead, the schemas mark milestones on a path on which the previous schema, and the lessons learnt in using it for the representation of corpora, provided the basis for the development of the next schema in line. The schema developed in the context of a planned German reference corpus on CMC published in the TEI Journal ('DeRiK schema', Beißwenger et al 2012) as well as its variant adapted for representing a German Wikipedia corpus (Margaretha & Lüngen 2014) marked the initial points in that process. The French CoMeRe group around Thierry Chanier adapted and extended the DeRiK schema to represent 14 existing French CMC corpora on different CMC genres in a uniform and interoperable way ('CoMeRe schema', Chanier et al 2014).…”

mentioning

confidence: 99%

Le CMC-core : un schéma de représentation des corpus de la CMR en TEI

Beißwenger¹,

Lüngen²

2020

corpus

Self Cite

View full text Add to dashboard Cite

In this paper we describe a schema and models which have been developed for the representation of corpora of computer-mediated communication (CMC corpora) using the representation framework provided by the Text Encoding Initiative (TEI). The schema presented here is the result of the activities and discussions within an international community of researchers who have been building, annotating and processing CMC data for the integration into corpus infrastructures (CLARIN, ORTOLANG) and use these corpora for purposes of linguistic research on linguistic variation and language change in and through the impact of internet-based communication technologies and applications. Discourse in the scope of CMC corpora (= "computer-mediated communication") is characterised as dialogic, sequentially organised interchange between humans which is conducted using communication technologies such as chats, messengers, online forums; social media platforms and applications such as Twitter, Facebook, Instagram or WhatsApp; the communication functions of collaborative platforms and projects (e.g. in the Wikipedia or in learning environments); or 3D environments (e.g. Second Life, gaming environments). 2 Discourse found in CMC exhibits features that cannot be adequately handled by schemas and tools developed for the representation, annotation and processing of discourse that conforms to the written standard and the structural conventions of established text types (e.g., newspaper articles, prose, scientific articles). It also significantly differs from the language and structure of spoken conversation so that CMC-core: a schema for the representation of CMC corpora in TEI Corpus, 20 | 2020 Listing 4. Written and spoken post in WhatsApp chat interaction including an emoji, adapted to CMC-core. From the corpus MoCoDa2 (2018) Listing 3. A blog comment, replying to a previous comment. From the Scilogs corpus, adapted to CMC-core (Grumt Suárez et al. 2016) CMC-core: a schema for the representation of CMC corpora in TEI

show abstract

“…We use the STTS tagset for annotation. Annotation rules for social media characteristics are given in [15], [16] and [17]. The general tagger model is taken from [4] and explained in Subsection III-A.…”

Section: Webtaggermentioning

confidence: 99%

A POS Tagger for Social Media Texts Trained on Web Comments

Neunerdt¹,

Reyer²,

Mathar³

2013

Polibits

View full text Add to dashboard Cite

Abstract-Using social media tools such as blogs and forums have become more and more popular in recent years. Hence, a huge collection of social media texts from different communities is available for accessing user opinions, e.g., for marketing studies or acceptance research. Typically, methods from Natural Language Processing are applied to social media texts to automatically recognize user opinions. A fundamental component of the linguistic pipeline in Natural Language Processing is Part-of-Speech tagging. Most state-of-the-art Part-of-Speech taggers are trained on newspaper corpora, which differ in many ways from non-standardized social media text. Hence, applying common taggers to such texts results in performance degradation. In this paper, we present extensions to a basic Markov model tagger for the annotation of social media texts. Considering the German standard Stuttgart/Tübinger TagSet (STTS), we distinguish 54 tag classes. Applying our approach improves the tagging accuracy for social media texts considerably, when we train our model on a combination of annotated texts from newspapers and Web comments.

show abstract

A TEI Schema for the Representation of Computer-mediated Communication

Cited by 14 publications

References 17 publications

Interactions between text chat and audio modalities for L2 communication and feedback in the synthetic worldSecond Life

Interactions between text chat and audio modalities for L2 communication and feedback in the synthetic worldSecond Life

Le CMC-core : un schéma de représentation des corpus de la CMR en TEI

A POS Tagger for Social Media Texts Trained on Web Comments

Contact Info

Product

Resources

About