This paper introduces a new project, Digital Editions for Corpus Linguistics (DECL),
IntroductionThe Digital Editions for Corpus Linguistics (DECL) project aims to create a framework for producing online editions of historical manuscripts suited for both corpus linguistic and historical research. This framework, consisting of a set of guidelines and associated tools, is designed especially for small projects or individual scholars. A completed DECL edition will, in effect, constitute a lightly annotated corpus text. In addition to a faithful graphemic transcription of the text itself, DECL editions will also contain information about the underlying manuscript reality, including features like layout and scribal annotation, together with a normalised version of the text. All of these features, encoded in standoff XML, can be used or ignored while searching or displaying the text.DECL was formed by three postgraduate students at the Research Unit for Variation, Contacts and Change in English (VARIENG) at the University of Helsinki in 2007. We shared a dissatisfaction with extant tools and resources, believing that digitised versions of historical texts and manuscripts generally failed to live up to expectations. At the same time, we recognised that digitisation was time-consuming and complicated, and thus compromises had been made in the creation of digital editions and corpora. In order to alleviate these problems, we began the design of a user-friendly framework for the creation of linguistically oriented digital editions created using extant standards, tools and solutions.The first three DECL editions will form the bases for the doctoral dissertations of the writers. Each of these editions-a Late Medieval bilingual medical handbook (Alpo Honkapohja), a family of 15th-century culinary recipe collections (Ville Marttila), and a collection of early 17th-century intelligence letters (Samuli Kaislaniemi)-will serve both as a template for the encoding guidelines for that particular text type and as a development platform for the common toolset. The editions, along with a working toolset and guidelines, are scheduled to be available within the next five years.