A survey of historical document image datasets

Nikolaidou, Konstantina; Seuret, Mathias; Mokayed, Hamam; Liwicki, Marcus

doi:10.1007/s10032-022-00405-8

Cited by 18 publications

(7 citation statements)

References 167 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LDM (Rombach et al 2022) proposes a cross-attention mechanism to incorporate the condition into the UNet and treats the diffusion process in the latent space. In text image generation, (Luhman and Luhman 2020;Gui et al 2023;Nikolaidou et al 2023) apply diffusion models to generate handwritten characters and demonstrate their promising effects. CTIG-DM (Zhu et al 2023) devises image, text, and style as conditions and introduces four text image generation modes in a diffusion model.…”

Section: Diffusion Modelmentioning

confidence: 99%

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Yang,

Peng,

Kong

et al. 2024

AAAI

View full text Add to dashboard Cite

Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block, which effectively combines global and local content cues across different scales, leading to enhanced preservation of intricate strokes of complex characters. Moreover, to better manage the large variations in style transfer, we propose a Style Contrastive Refinement (SCR) module, which is a novel structure for style representation learning. It utilizes a style extractor to disentangle styles from images, subsequently supervising the diffusion model via a meticulously designed style contrastive loss. Extensive experiments demonstrate FontDiffuser's state-of-the-art performance in generating diverse characters and styles. It consistently excels on complex characters and large style changes compared to previous methods. The code is available at https://github.com/yeungchenwa/FontDiffuser.

show abstract

Section: Diffusion Modelmentioning

confidence: 99%

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Yang,

Peng,

Kong

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…While page segmentation approaches, such as [ 7 ], return text/non-text masks, the latter of which often includes visual elements, this remains insufficient for any meaningful historical study, as such approaches often lack accurate visual element localization, as well as semantic classification of these elements. In this regard, one of the main hurdles hindering the success of semantic visual element recognition within historical documents is their high variability, as well as the general scarcity of coherent historical datasets focused on visual element recognition, with only 11 out of the 56 historical document datasets mentioned in [ 15 ] containing graphical elements. Additionally, in the majority of cases where visual elements were recorded in these datasets, they were not classified according to their semantic classes [ 16 ].…”

Section: State Of the Artmentioning

confidence: 99%

CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents

et al. 2022

View full text Add to dashboard Cite

Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies.

show abstract

“…In contrast, modern learning-based methods exhibit the ability to infer styled glyphs, even in cases where they have not been directly observed in the reference style examples. Despite a few attempts to perform HTG with Diffusion Models [33], [34], the most typical strategy is to leverage GANs, which can be unconditioned in the case of non-stylized HTG or conditioned on a variable number of handwriting style samples in the case of stylized HTG.…”

Section: Related Workmentioning

confidence: 99%

“…However, there is no agreement on the split to adopt, i.e., on which authors and relative images should be included in the training set and which in the test. As a result, some works adopt the standard HTR split [1], [3], [11], [25], [34], [47], [48] (commonly known as Aachen split), while others [2], [24], [26], [27], [29], [33], [49]- [51] consider the original split proposed with the IAM dataset, which entails a different distribution of the authors between training and test. Moreover, the text content of the generated words and the style samples considered for each author in styled-HTG are usually selected randomly, thus further hindering the fair comparison also between approaches adopting the same IAM splitting.…”

Section: Related Workmentioning

confidence: 99%

“…The resulting textual content representation is then combined with the style features and fed to another model component that outputs the styled text image. For training HTG models, most of the State-of-the-Art approaches (apart from a few preliminary works exploiting Diffusion Models [33], [34]) follow the adversarial learning paradigm. This means that the HTG model is the generator, and another dedicated network is used as a discriminator to distinguish the writer's real images from the model's generated ones.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Detecting Tool Keypoints with Synthetic Training Data

Vanherle

Put

Michiels

et al. 2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect -the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research -the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.

show abstract

A survey of historical document image datasets

Cited by 18 publications

References 167 publications

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents

Detecting Tool Keypoints with Synthetic Training Data

Contact Info

Product

Resources

About