2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00164
|View full text |Cite
|
Sign up to set email alerts
|

Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts

Abstract: Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text line… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(23 citation statements)
references
References 36 publications
(41 reference statements)
0
23
0
Order By: Relevance
“…These networks can handle different layouts of printed documents, but require many training examples -more than 1,000 documents in these studies. To the best of our knowledge, the only attempt at applying object detection networks on historical documents was done by Prusty et al [55]. They have trained Mask R-CNN on 120 to 350 documents to find instances of different page objects, such as text-lines and page boundaries, in historical Indic manuscripts.…”
Section: Neural Network-based Strategiesmentioning
confidence: 99%
“…These networks can handle different layouts of printed documents, but require many training examples -more than 1,000 documents in these studies. To the best of our knowledge, the only attempt at applying object detection networks on historical documents was done by Prusty et al [55]. They have trained Mask R-CNN on 120 to 350 documents to find instances of different page objects, such as text-lines and page boundaries, in historical Indic manuscripts.…”
Section: Neural Network-based Strategiesmentioning
confidence: 99%
“…Extant copies of these early manuscripts written in Greek or Latin and usually dating from the 4th century to the 8th century AD, are classified according to their use of either all upper case or all lower case letters. Several researchers addressed the analysis and recognition of these documents; even considering only those published in the ICDAR 2019 proceedings we can count nine papers ([ 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ]).…”
Section: Historical Documentsmentioning
confidence: 99%
“…Different text lines are located in Indic historical documents in Reference [ 14 ] by using a deep model based on a Mask R-CNN with a ResNet-50 backbone. The different task of instance segmentation (that separates individual objects in the page, e.g., each text line) with respect to semantic segmentation (that aims at identifying pixels belonging to a given object type, e.g., text lines) is taken into account and discussed in the paper.…”
Section: Addressed Problemsmentioning
confidence: 99%
See 2 more Smart Citations