We have created Manga109, a dataset of a variety of 109 Japanese comic books publicly available for use for academic purposes. This dataset provides numerous comic images but lacks the annotations of elements in the comics that are necessary for use by machine learning algorithms or evaluation of methods. In this paper, we present our ongoing project to build metadata for Manga109. We first define the metadata in terms of frames, texts and characters. We then present our web-based software for efficiently creating the ground truth for these images. In addition, we provide a guideline for the annotation with the intent of improving the quality of the metadata.
Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the frames, speech texts, character faces, and character bodies; the total number of annotations exceeds 500k. This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms and their evaluation. In addition to academic use, we obtained further permission for a subset of the dataset for industrial use. In this article, we describe the details of the dataset and present a few examples of multimedia processing applications (detection, retrieval, and generation) that apply existing deep learning methods and are made possible by the dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.