CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents

Büttner, Jochen; Martinetz, Julius; El-Hajj, Hassan; Valleriani, Matteo

doi:10.3390/jimaging8100285

Cited by 6 publications

(6 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Electronic copies of these books are available via the project's database, comprising over 70,000 pages, 23,000 of which contain visual elements. These visual elements were collected both manually and with the help of neural networks (Büttner et al, 2022). The Sphaera dataset is stored in a large knowledge graph modeled according to the CIDOC-CRM standards (Bekiari et al, 2021), where information about the editions, as well as fine-grained information about their content is stored (Kräutli & Valleriani, 2018;El-Hajj et al, 2022).…”

Section: Datasetmentioning

confidence: 99%

“…The second added class refers to illustrations of material objects, namely those which one can refer to as machines. This data was collected from Branca (1629); Zonca (1607); Ramelli (1588); Besson (1595) using CorDeep (https://cordeep.mpiwg-berlin.mpg.de), a web service designed to extract and classify visual elements from historical documents (Büttner et al, 2022). In total, the dataset contains 5,879 pages distributed across the four classes, as shown in Table 1; each class is represented by a single sample in Fig.…”

Section: Datasetmentioning

confidence: 99%

“…These are selected by domain experts and labeled accordingly. To facilitate this process, illustrations are extracted from full book pages using the automated image segmentation pipeline CorDeep (Büttner et al, 2022). A web service to extract visual elements from various input types including PDF and common image file formats is accessible via https://cordeep.mpiwg-berlin.…”

Section: Appendix A: Explainable Aimentioning

confidence: 99%

“…In historical document layout analysis in particular, e.g., in Xu et al (2018), the authors relied on a Multi-Task Fully Convolutional Network (FCN) to segment highly unstructured manuscript and printed-text pages into multiple semantically relevant groups (e.g., marginalia, main text, and comments), while Ravichandra et al (2022) opts for an object-detection based approach relying on the YOLO model (Redmon et al, 2015). Others have recognized the value of extracting images from historical documents due to their importance in transmitting the information and ideas contained in the texts, leading to approaches such as the FCN networks presented in Monnier and Aubry (2020) and the object detection-based methodologies applied to specific corpora adopted by Dutta et al (2021); Büttner et al (2022) from techniques like YOLO (Redmon et al, 2015), U-Net (Ronneberger et al, 2015), or Faster R-CNN (Ren et al, 2016). By getting closer to the textual content of these documents, numerous AI-based approaches for optical character recognition (OCR) and handwritten text recognition (HTR) have been proposed, with deep learning-based approaches (Jaderberg et al, 2016) setting new standards.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Explainability and transparency in the realm of digital humanities: toward a historian XAI

El-Hajj,

Eberle,

Merklein

et al. 2023

Int J Digit Humanities

Self Cite

View full text Add to dashboard Cite

The recent advancements in the field of Artificial Intelligence (AI) translated to an increased adoption of AI technology in the humanities, which is often challenged by the limited amount of annotated data, as well as its heterogeneity. Despite the scarcity of data it has become common practice to design increasingly complex AI models, usually at the expense of human readability, explainability, and trust. This in turn has led to an increased need for tools to help humanities scholars better explain and validate their models as well as their hypotheses. In this paper, we discuss the importance of employing Explainable AI (XAI) methods within the humanities to gain insights into historical processes as well as ensure model reproducibility and a trustworthy scientific result. To drive our point, we present several representative case studies from the Sphaera project where we analyze a large, well-curated corpus of early modern textbooks using an AI model, and rely on the XAI explanatory outputs to generate historical insights concerning their visual content. More specifically, we show that XAI can be used as a partner when investigating debated subjects in the history of science, such as what strategies were used in the early modern period to showcase mathematical instruments and machines.

show abstract

Section: Datasetmentioning

confidence: 99%

Section: Datasetmentioning

confidence: 99%

Section: Appendix A: Explainable Aimentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Explainability and transparency in the realm of digital humanities: toward a historian XAI

El-Hajj,

Eberle,

Merklein

et al. 2023

Int J Digit Humanities

Self Cite

View full text Add to dashboard Cite

show abstract

“…Afterwards, the results were injected into Kraken [23] to generate text lines and the OCR output. Likewise, Büttner et al [24] applied a YoloV5 network to detect graphical elements, such as initials, decorations, printer's marks, or content illustrations, in different historical documents.…”

Section: Instance-level Segmentation For Page Layout Analysismentioning

confidence: 99%

Line-Level Layout Recognition of Historical Documents with Background Knowledge

2023

View full text Add to dashboard Cite

Digitization and transcription of historic documents offer new research opportunities for humanists and are the topics of many edition projects. However, manual work is still required for the main phases of layout recognition and the subsequent optical character recognition (OCR) of early printed documents. This paper describes and evaluates how deep learning approaches recognize text lines and can be extended to layout recognition using background knowledge. The evaluation was performed on five corpora of early prints from the 15th and 16th Centuries, representing a variety of layout features. While the main text with standard layouts could be recognized in the correct reading order with a precision and recall of up to 99.9%, also complex layouts were recognized at a rate as high as 90% by using background knowledge, the full potential of which was revealed if many pages of the same source were transcribed.

show abstract

Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models

El-Hajj,

Valleriani

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents

Cited by 6 publications

References 48 publications

Explainability and transparency in the realm of digital humanities: toward a historian XAI

Explainability and transparency in the realm of digital humanities: toward a historian XAI

Line-Level Layout Recognition of Historical Documents with Background Knowledge

Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models

Contact Info

Product

Resources

About