Peoples nowadays prefer to use digital gadgets like cameras or mobile phones for capturing documents. Automatic extraction of panels/characters from the images of a comic document is challenging due to the wide variety of drawing styles adopted by writers, beneficial for readers to read them on mobile devices at any time and useful for automatic digitization. Most of the methods for localization of panel/character rely on the connected component analysis or page background mask and are applicable only for a limited comic dataset. This work proposes a panel/character localization architecture based on the features of YOLO and CNN for extraction of both panels and characters from comic book images. The method achieved remarkable results on Bengali Comic Book Image dataset (BCBId) consisting of total 4130 images, developed by us as well as on a variety of publicly available comic datasets in other languages, i.e. eBDtheque, Manga 109 and DCM dataset.
It is truly amazing that human beings can easily understand and comprehend the intended meaning of an ambiguous word. The meaning of an ambiguous word differs with its usage in a different context. Still, human beings can Fig. out the meanings with ease. We have machine translation (MT) systems that can translate from a source language to its equivalent target language. The main intension of these MT systems is to seamlessly transfer the intended meaning of the source text to its target text. But due to ambiguous nature of natural language, MT systems do suffer from setbacks. Word sense disambiguation (WSD) is one of the greatest challenges to overcome. The researchers have contributed a number of WSD algorithms that operate over textual data. These algorithms were primarily developed to disambiguate an ambiguous word or to determine the exact meaning of an ambiguous word based on the context. It seems that context plays a decisive role while disambiguating an ambiguous word. A section of the research community are of the opinion that the neighbouring words that appear along with an ambiguous word in a sentence might help in finding the meaning or sense of an ambiguous word. This is commonly known as distributional semantics. In this paper, we have proposed a novel technique to remove the ambiguity of polysemous noun using a multimodal distributional semantics model (MDSM). The arduous task was to find a standard multimodal database for carrying out our desired experiments. This was compensated by using ImageNet database. ImageNet is a large-scale database containing tens of millions of annotated images organized by the semantic hierarchy of WordNet. Our MDSM exploits both the image features and textual features from the annotated images of ImageNet database. For both the training and testing purpose, we have used a total of 8 different synsets. The total no. of 800 related images to these synsets are used for training (each synset contains a reduced set of images i.e. 100 images). However, the total no. of images used for testing is 8 ((each synset contains 1 image) only. The 8 synsets that we have considered are {
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.