In this paper, we present a text-line segmentation method for historical documents. Historical documents are challenging given their characteristics of highly degradation, writing style variation and diacritics. From these observations, we proposed an effective approach for text line segmentation by analysing the properties of document layouts. We combine the idea of seam carving method with the novel cost functions to accurately split text lines. Experiments were conducted on two challenging datasets of historical documents, namely the DIVA-HisDB dataset and our ChamDoc dataset. Our methods provided good results on the DIVA-HisDB dataset with 99.36% of Line IU and 98.86% of Pixel IU. On the ChamDoc dataset, the proposed method outperformed the two baseline approaches i.e. seam carving-based and A* path planning by a large margin.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.