Unsupervised Deep Learning for Handwritten Page Segmentation

Droby, Ahmad; Barakat, Berat Kurar; Madi, Borak; Alaasam, Reem; El-Sana, Jihad

doi:10.1109/icfhr2020.2020.00052

Cited by 10 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, in [10] the authors tackle the challenge of limited ground truth availability by proposing an unsupervised deep learning approach for page segmentation. Their method involves the use of a Siamese neural network to differentiate between patches based on quantifiable properties, with a specific emphasis on the count of foreground pixels.…”

Section: Related Workmentioning

confidence: 99%

“…In the past few years, this problem has been tackled by various authors [9][10][11], who developed a set of few-shot-learning-oriented frameworks specifically aiming at leveraging the small amount of data available to generate more and more accurate predictions for the task at hand, producing results that are on par or even surpass previously available state-of-the-art approaches that relied on much more data. In the present paper, we tackle the problem from another point of view by exploring different transfer learning approaches as a way to make good use of alternative data sources to pre-train our models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

In-domain versus Out-of-domain transfer learning for document layout analysis

De Nardin,

Zottin,

Piciarelli

et al. 2024

Preprint

View full text Add to dashboard Cite

Data availability is a big concern in the field of document analysis, especially when working on tasks that require a high degree of precision when it comes to the definition of the ground truths on which to train deep learning models. A notable example is represented by the task of Document Layout Analysis in handwritten documents, which requires pixel-precise segmentation maps to highlight the different layout components of each document page. These segmentation maps are typically very time-consuming and require a high degree of domain knowledge to be defined, as they are intrinsically characterized by the content of the text. For this reason in the present work, we explore the effects of different initialization strategies for deep learning models employed for this type of task by relying on both in-domain and cross-domain datasets for their pre-training. To test the employed models we use 2 publicly available datasets with heterogeneous characteristics both regarding their structure as well as the languages of the contained documents. We show how a combination of cross-domain and in-domain transfer learning approaches leads to the best overall performance of the models, as well as speeding up their convergence process.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

In-domain versus Out-of-domain transfer learning for document layout analysis

De Nardin,

Zottin,

Piciarelli

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Pertaining to historical documents, much research has been conducted recently for analyzing such documents. In this regard, the two kinds of approaches consist of conventional [22][23][24][25] and deep learning-like methods [26][27][28][29][30][31][32] to deal with detection of text in old documents. Phan et al [22] extracted characters depending on analyzing connected components.…”

Section: Related Workmentioning

confidence: 99%

“…Further, despite the use of sharing parameters to speed up the training, it is still not perfect enough for the character detection task because it still aches from the mislocalization problem. Ahmad et al [31] suggested a new page segmentation method that uses Siamese network to find the difference between patches; then, the extracted features were used to segment the page into main and side text regions, which means the authors handled the problem of pre-processing steps for document analysis without addressing the problem of word or character detection and recognition. In addition, expensive time was used for extracting the feature for every possible patch.…”

Section: Related Workmentioning

confidence: 99%

FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection

Alnaasan

Kim

2021

Sensors

View full text Add to dashboard Cite

Inaccurate localization due to scale-variation during character detection causes a widespread issue overconfidence in results of the document analysis community, for the most part in historical and handwritten documents. In this work, we explored the performance of a state-of-the-art network with a simple pipeline that fast and accurately predicts handwritten Chinese characters in old documents. In order to adapt to locations of characters with multi-scale more precisely, excluding pre-processing and in-between steps, we utilized a network with multi-scale feature maps. Then, across each feature map, pre-selected boxes of unalike scales and aspect ratios were employed. The last step was to prune the bounding boxes, sending them to non-maximum suppression to yield the final results. Focusing on a well-designed neural network architecture and loss function that presents well-classified examples, we found our experiments on Caoshu, Character, and Src-images datasets demonstrated that detection performance was enhanced for the detection rate (DT), the false positive per character (FPPC), and the F-score in the order of 98.84%, 0.71, and 97.64%, respectively. In comparison with SSD (single-shot detector), the detection performance of a detection rate (DT), the false positive per character (FPPC), and the F-score were 61.12%, 6.12, and 60.33%, respectively.

show abstract

“…Recently, character-level detection, with its breakneck progress in the deep learning branch, has been handled as a feature extraction problem performed by a convolutional neural network CNN. In this regard, hierarchical, sequence-based, and segmentation-based models [ 1 , 2 , 3 ] have been presented to compensate for the lack of datasets, and pre- and post-processing approaches [ 4 ] provide pretty good solutions for precise detection tasks. However, it is worth mentioning that these methods may suffer from over-segmentation errors, and they might inaccurately position characters in a document.…”

Section: Introductionmentioning

confidence: 99%

Handwritten Multi-Scale Chinese Character Detector with Blended Region Attention Features and Light-Weighted Learning

Alnaasan

Kim

2023

Sensors

View full text Add to dashboard Cite

Character-level detection in historical manuscripts is one of the challenging and valuable tasks in the computer vision field, related directly and effectively to the recognition task. Most of the existing techniques, though promising, seem not powerful and insufficiently accurate to locate characters precisely. In this paper, we present a novel algorithm called free-candidate multiscale Chinese character detection FC-MSCCD, which is based on lateral and fusion connections between multiple feature layers, to successfully predict Chinese characters of different sizes more accurately in old documents. Moreover, cheap training is exploited using cheaper parameters by incorporating a free-candidate detection technique. A bottom-up architecture with connections and concatenations between various dimension feature maps is employed to attain high-quality information that satisfies the positioning criteria of characters, and the implementation of a proposal-free algorithm presents a computation-friendly model. Owing to a lack of handwritten Chinese character datasets from old documents, experiments on newly collected benchmark train and validate FC-MSCCD to show that the proposed detection approach outperforms roughly all other SOTA detection algorithms

show abstract

Unsupervised Deep Learning for Handwritten Page Segmentation

Cited by 10 publications

References 23 publications

In-domain versus Out-of-domain transfer learning for document layout analysis

In-domain versus Out-of-domain transfer learning for document layout analysis

FAN-MCCD: Fast and Accurate Network for Multi-Scale Chinese Character Detection

Handwritten Multi-Scale Chinese Character Detector with Blended Region Attention Features and Light-Weighted Learning

Contact Info

Product

Resources

About