Creation of a High-quality, Register-diversified Parallel (English-Spanish) Corpus for Linguistic and Computational Investigations

Lavid, Julia; Arús, Jorge; Declerck, Bernard; Hoste, Véronique

doi:10.1016/j.sbspro.2015.07.443

Cited by 6 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to test the reliability and consistency of the core and the extended tagsets of the proposed annotation scheme, a pilot agreement study was performed by two expert annotators working on a small training corpus of eighteen comparable texts extracted from the larger set of bilingual newspaper texts contained in the MULTI-NOT corpus (Lavid et al 2015). The training corpus contained eighteen texts, evenly divided into comparable sets of English and Spanish news reports, editorials and letters to the editor.…”

Section: Agreement Studymentioning

confidence: 99%

“…It is necessary, therefore, to empirically test the validity of these categories not only for theoretical purposes but also to be used for the annotation of large datasets, which can be later used for computational purposes such as Machine Learning, Automatic Text Classification, Multilingual Information Retrieval and Sentiment Analysis, among others. In this paper, we undertake this task in the context of the MULTINOT project, aimed at the creation and empirical validation of discourse features in English and Spanish through corpus analysis and annotation (see Lavid et al 2015;Lavid, Moratón 2016).…”

Section: Introductionmentioning

confidence: 99%

“…The paper describes the process of validating a reliable annotation scheme for the categories of Stance and Engagement in English and Spanish using a bilingual sample of English-Spanish journalistic texts extracted from the MULTINOT corpus (Lavid et al 2015). The bilingual sample includes three different newspaper genres: news reports, editorials and letters to the editor.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Stance and Engagement in English and Spanish Journalistic Texts: Towards a Reliable Annotation Scheme for Linguistic and Computational Purposes

Moratón¹,

Lavid

2018

Self Cite

View full text Add to dashboard Cite

Section: Agreement Studymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Stance and Engagement in English and Spanish Journalistic Texts: Towards a Reliable Annotation Scheme for Linguistic and Computational Purposes

Moratón¹,

Lavid

2018

Self Cite

View full text Add to dashboard Cite

“…In this paper we describe the construction and empirical validation of a modality annotation scheme for English and Spanish in the context of the MULTINOT project, whose main aim is the development of a parallel English-Spanish corpus which is balanced -in terms of register diversity and translation directions -and whose design and enrichment with multiple layers of linguistic annotations focuses on quality rather than on quantity (see Lavid et al 2015) 2 .…”

Section: Introductionmentioning

confidence: 99%

A linguistically-motivated annotation model of modality in English and Spanish: Insights from MULTINOT

Lavid

Carretero

Zamorano-Mansilla

2016

LiLT

Self Cite

View full text Add to dashboard Cite

In this paper we present current work on the design and validation of a linguistically-motivated annotation model of modality in English and Spanish in the context of the MULTINOT project. Our annotation model captures four basic modal meanings and their subtypes, on the one hand, and provides a fine-grained characterisation of the syntactic realisations of those meanings in English and Spanish, on the other. We validate the modal tagset proposed through an agreement study performed on a bilingual sample of four hundred sentences extracted from original texts of the MULTINOT corpus, and discuss the difficult cases encountered in the annotation experiment. We also describe current steps in the implementation of the proposed scheme for the large-scale annotation of the bilingual corpus using both automatic and manual procedures.

show abstract

“…The MULTINOT corpus consists of originals and translated texts in both directions and is enriched with linguistic annotations which can be exploited in a number of linguistic, applied and computational contexts (see Lavid et al 2015).…”

Section: Introductionmentioning

confidence: 99%

Designing and Validating an Annotation Model of Dynamic Modality for English and Spanish: Issues and Problems

Lavid¹,

Carretero²,

Zamorano³

EPiC Series in Language and Linguistics

Self Cite

View full text Add to dashboard Cite

In this paper we set forth an annotation model for dynamic modality in English and Spanish, given its relevance not only for contrastive linguistic purposes, but also for its impact on practical annotation tasks in the Natural Language Processing (NLP) community. An annotation scheme is proposed, which captures both the functionalsemantic meanings and the language-specific realisations of dynamic meanings in both languages. The scheme is validated through a reliability study performed on a randomly selected set of one hundred and twenty sentences from the MULTINOT corpus, resulting in a high degree of inter-annotator agreement. We discuss our main findings and give attention to the difficult cases as they are currently being used to develop detailed guidelines for the large-scale annotation of dynamic modality in English and Spanish.

show abstract

Creation of a High-quality, Register-diversified Parallel (English-Spanish) Corpus for Linguistic and Computational Investigations

Cited by 6 publications

References 11 publications

Stance and Engagement in English and Spanish Journalistic Texts: Towards a Reliable Annotation Scheme for Linguistic and Computational Purposes

Stance and Engagement in English and Spanish Journalistic Texts: Towards a Reliable Annotation Scheme for Linguistic and Computational Purposes

A linguistically-motivated annotation model of modality in English and Spanish: Insights from MULTINOT

Designing and Validating an Annotation Model of Dynamic Modality for English and Spanish: Issues and Problems

Contact Info

Product

Resources

About