2019
DOI: 10.1007/978-3-030-30760-8_33
|View full text |Cite
|
Sign up to set email alerts
|

Clipping the Page – Automatic Article Detection and Marking Software in Production of Newspaper Clippings of a Digitized Historical Journalistic Collection

Abstract: This paper describes utilization of article detection and extraction on the Finnish Digi 1 newspaper material of the National Library of Finland (NLF) using data of one newspaper, Uusi Suometar 1869-1918. We use PIVAJ software [1] for detection and marking of articles in our collection. Out of the separated articles we can produce automatic clippings for the user. The user can collect clippings for own use both as images and as OCRed text. Together these functionalities improve usability of the digitized journ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…But insufficient OCR, incomplete collections or unsatisfactory layout segmentation do not necessarily have to be accepted as an unchangeable fact. One of the options is to re‐OCR digitized documents if the quality of text recognition is not satisfactory (Kettunen et al, 2019 ; Neudecker et al, 2019 ). Transkribus 9 is a comprehensive platform for the recognition and transcription of historical documents, widely used in this task.…”
Section: Interdisciplinary Digital Hermeneutics In Action: Three Exam...mentioning
confidence: 99%
“…But insufficient OCR, incomplete collections or unsatisfactory layout segmentation do not necessarily have to be accepted as an unchangeable fact. One of the options is to re‐OCR digitized documents if the quality of text recognition is not satisfactory (Kettunen et al, 2019 ; Neudecker et al, 2019 ). Transkribus 9 is a comprehensive platform for the recognition and transcription of historical documents, widely used in this task.…”
Section: Interdisciplinary Digital Hermeneutics In Action: Three Exam...mentioning
confidence: 99%
“…Also, scholars expressed a desire for "segmentation" of content such that they can search at individual-article granularity. While attempts exist to provide automated clippings from today's full-page entities (Kettunen et al, 2019), doing this is not easy. The individual articles are seldom enough in any case; scholars need their metadata.…”
Section: Implications and Future Researchmentioning
confidence: 99%