2019
DOI: 10.1007/s10032-019-00332-1
|View full text |Cite
|
Sign up to set email alerts
|

A two-stage method for text line detection in historical documents

Abstract: This work presents a two-stage text line detection method for historical documents. Each detected text line is represented by its baseline. In a first stage, a deep neural network called ARU-Net labels pixels to belong to one of the three classes: baseline, separator or other. The separator class marks beginning and end of each text line. The ARU-Net is trainable from scratch with manageably few manually annotated example images (less than 50). This is achieved by utilizing data augmentation strategies. The ne… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
94
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(102 citation statements)
references
References 49 publications
(79 reference statements)
1
94
0
Order By: Relevance
“…Modern approaches have overcome these obstacles using deep learning, presenting solutions that are robust in the presence of noise, arbitrary document layouts, arbitrary orientation, and curved text lines. Grüning et al [2] use a FCN for pixel labeling followed by post-processing to extract text lines from the pixel predictions. Wigington et al [3] use a FCN to detect the beginning of text lines and have a network segment the line by stepping along it.…”
Section: A Text Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Modern approaches have overcome these obstacles using deep learning, presenting solutions that are robust in the presence of noise, arbitrary document layouts, arbitrary orientation, and curved text lines. Grüning et al [2] use a FCN for pixel labeling followed by post-processing to extract text lines from the pixel predictions. Wigington et al [3] use a FCN to detect the beginning of text lines and have a network segment the line by stepping along it.…”
Section: A Text Detectionmentioning
confidence: 99%
“…The primary exception is comments, which are often oriented independently of the document. While work has been done to detect accurate bounding regions for skewed and even curved text [3], [2], we choose a simpler method that is robust to small amounts of skew and assumes straight lines.…”
Section: Detectionmentioning
confidence: 99%
“…21 The technical partner for the development of the layout analysis and training and recognition software is the CITlab 22 team at the University of Rostock whose approach performed best on the sub task of the detection of baselines, that is the line supporting the main bodies of characters within a text line, at a competition layout analysis for challenging medieval manuscripts at ICDAR2017 [28]. Several related publications are available (see for example [29,30] for layout analysis and [31] for HTR) but to the best of our knowledge the exact state of the software actually incorporated in Transkribus is not publicly known. Therefore, the best source for results seems a recently (May 2019) published talk 23 which biefly sums up some evaluations: After training on close to 36,000 words corresponding to 182 pages a CER of 3.1% and a WER of 13.1% was achieved on a dataset from the 18 th century written by a single writer in German.…”
Section: Transkribusmentioning
confidence: 99%
“…These models have been trained on a wide variety of books and typesets and, depending on the material used, can usually provide at least a valid starting point to start off the manual GT production or even already provide a satisfactory final result. OCR4all comes with four single standard models 30 which are automatically incorporated and made available when building the Docker image: antiqua_modern, antiqua_historical, fraktur_19th_century, and fraktur_historical. Since voting ensembles have proven to be very effective, we additionally provide a full set of model ensembles 31 consisting of five models for each of the four single model areas mentioned above, which can be downloaded and directly added into OCR4all.…”
Section: Character Recognitionmentioning
confidence: 99%
See 1 more Smart Citation