2015
DOI: 10.1016/j.sbspro.2015.07.443
|View full text |Cite
|
Sign up to set email alerts
|

Creation of a High-quality, Register-diversified Parallel (English-Spanish) Corpus for Linguistic and Computational Investigations

Abstract: This paper outlines current work on the construction of a high-quality, richly-annotated and register-diversified parallel corpus for the English-Spanish language pair, as currently carried out within the framework of the MULTINOT project. The corpus consists of original and translated texts in both directions and is designed as a multifunctional resource to be used in a number of disciplines such as corpus-based contrastive linguistic and translation studies, machine translation, computer-assisted translation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
2
2
1

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…In order to test the reliability and consistency of the core and the extended tagsets of the proposed annotation scheme, a pilot agreement study was performed by two expert annotators working on a small training corpus of eighteen comparable texts extracted from the larger set of bilingual newspaper texts contained in the MULTI-NOT corpus (Lavid et al 2015). The training corpus contained eighteen texts, evenly divided into comparable sets of English and Spanish news reports, editorials and letters to the editor.…”
Section: Agreement Studymentioning
confidence: 99%
See 2 more Smart Citations
“…In order to test the reliability and consistency of the core and the extended tagsets of the proposed annotation scheme, a pilot agreement study was performed by two expert annotators working on a small training corpus of eighteen comparable texts extracted from the larger set of bilingual newspaper texts contained in the MULTI-NOT corpus (Lavid et al 2015). The training corpus contained eighteen texts, evenly divided into comparable sets of English and Spanish news reports, editorials and letters to the editor.…”
Section: Agreement Studymentioning
confidence: 99%
“…It is necessary, therefore, to empirically test the validity of these categories not only for theoretical purposes but also to be used for the annotation of large datasets, which can be later used for computational purposes such as Machine Learning, Automatic Text Classification, Multilingual Information Retrieval and Sentiment Analysis, among others. In this paper, we undertake this task in the context of the MULTINOT project, aimed at the creation and empirical validation of discourse features in English and Spanish through corpus analysis and annotation (see Lavid et al 2015;Lavid, Moratón 2016).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper we describe the construction and empirical validation of a modality annotation scheme for English and Spanish in the context of the MULTINOT project, whose main aim is the development of a parallel English-Spanish corpus which is balanced -in terms of register diversity and translation directions -and whose design and enrichment with multiple layers of linguistic annotations focuses on quality rather than on quantity (see Lavid et al 2015) 2 .…”
Section: Introductionmentioning
confidence: 99%
“…The MULTINOT corpus consists of originals and translated texts in both directions and is enriched with linguistic annotations which can be exploited in a number of linguistic, applied and computational contexts (see Lavid et al 2015).…”
Section: Introductionmentioning
confidence: 99%