2021
DOI: 10.32714/ricl.09.01.07
|View full text |Cite
|
Sign up to set email alerts
|

The burden of legacy: Producing the Tagged Corpus of Early English Correspondence Extension (TCEECE)

Abstract: This paper discusses the process of part-of-speech tagging the Corpus of Early English Correspondence Extension (CEECE), as well as the end result. The process involved normalisation of historical spelling variation, conversion from a legacy format into TEI-XML, and finally, tokenisation and tagging by the CLAWS software. At each stage, we had to face and work around problems such as whether to retain original spelling variants in corpus markup, how to implement overlapping hierarchies in XML, and how to calcu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 27 publications
0
0
0
Order By: Relevance