2014
DOI: 10.11649/cs.2014.008
|View full text |Cite
|
Sign up to set email alerts
|

The IMPACT project Polish Ground-Truth texts as a Djvu corpus

Abstract: The purpose of the paper is twofold. First, to describe the already implemented idea of DjVu corpora, i.e. corpora which consist of both scanned images and a transcription of the texts with the words associated with their occurrences in the scans. Secondly, to present a case study of a corpus consisting of almost 5 000 pages of Polish historical texts dating from 1570 to 1756 (it is practically the very first corpus of historical Polish). The tools described have universal character and are freely available un… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…The first historical corpus of Polish is a corpus of texts from the years 1572-1756 created by the IMPACT project (Bień, 2014). It contains 1.6 million tokens and comprises DjVu format scans linked to transliterated texts.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The first historical corpus of Polish is a corpus of texts from the years 1572-1756 created by the IMPACT project (Bień, 2014). It contains 1.6 million tokens and comprises DjVu format scans linked to transliterated texts.…”
Section: Related Workmentioning
confidence: 99%
“…The texts prepared in this way were subjected to transcription (standardisation). For this, it was decided to use an existing tool developed for the transcription of Polish historical texts within the IMPACT project (Bień, 2014). The tool uses a set of rewrite rules based on regular expressions.…”
Section: Transcriptionmentioning
confidence: 99%