Arpita Dutta scite author profile

Peoples nowadays prefer to use digital gadgets like cameras or mobile phones for capturing documents. Automatic extraction of panels/characters from the images of a comic document is challenging due to the wide variety of drawing styles adopted by writers, beneficial for readers to read them on mobile devices at any time and useful for automatic digitization. Most of the methods for localization of panel/character rely on the connected component analysis or page background mask and are applicable only for a limited comic dataset. This work proposes a panel/character localization architecture based on the features of YOLO and CNN for extraction of both panels and characters from comic book images. The method achieved remarkable results on Bengali Comic Book Image dataset (BCBId) consisting of total 4130 images, developed by us as well as on a variety of publicly available comic datasets in other languages, i.e. eBDtheque, Manga 109 and DCM dataset.

show abstract

Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images

Dutta

Garai

Biswas

et al. 2021

IJDAR

View full text Add to dashboard Cite

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Dutta

Biswas

Das

2021

IJDAR

View full text Add to dashboard Cite

Removal of Ambiguity of Noun Using Multimodal Approach

Dutta¹,

Singh²,

Borgohain³

2021

INDJCSE

View full text Add to dashboard Cite

It is truly amazing that human beings can easily understand and comprehend the intended meaning of an ambiguous word. The meaning of an ambiguous word differs with its usage in a different context. Still, human beings can Fig. out the meanings with ease. We have machine translation (MT) systems that can translate from a source language to its equivalent target language. The main intension of these MT systems is to seamlessly transfer the intended meaning of the source text to its target text. But due to ambiguous nature of natural language, MT systems do suffer from setbacks. Word sense disambiguation (WSD) is one of the greatest challenges to overcome. The researchers have contributed a number of WSD algorithms that operate over textual data. These algorithms were primarily developed to disambiguate an ambiguous word or to determine the exact meaning of an ambiguous word based on the context. It seems that context plays a decisive role while disambiguating an ambiguous word. A section of the research community are of the opinion that the neighbouring words that appear along with an ambiguous word in a sentence might help in finding the meaning or sense of an ambiguous word. This is commonly known as distributional semantics. In this paper, we have proposed a novel technique to remove the ambiguity of polysemous noun using a multimodal distributional semantics model (MDSM). The arduous task was to find a standard multimodal database for carrying out our desired experiments. This was compensated by using ImageNet database. ImageNet is a large-scale database containing tens of millions of annotated images organized by the semantic hierarchy of WordNet. Our MDSM exploits both the image features and textual features from the annotated images of ImageNet database. For both the training and testing purpose, we have used a total of 8 different synsets. The total no. of 800 related images to these synsets are used for training (each synset contains a reduced set of images i.e. 100 images). However, the total no. of images used for testing is 8 ((each synset contains 1 image) only. The 8 synsets that we have considered are {

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Arpita Dutta

Segmentation of Meaningful Text-Regions from Camera Captured Document Images

CNN Based Extraction of Panels/Characters from Bengali Comic Book Page Images

Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Removal of Ambiguity of Noun Using Multimodal Approach

Contact Info

Product

Resources

About