Manga109 dataset and creation of metadata

Fujimoto, Azuma; Ogawa, Teiichiro; Kazuyoshi, Yamamoto; Matsui, Yusuke; Yamasaki, Toshihiko; Aizawa, Kiyoharu

doi:10.1145/3011549.3011551

Cited by 113 publications

(66 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The network is trained on 800 training images and 5 validation images in the process. For testing we have used five standard benchmark datasets: Set5 [30], Set14 [31], B100 [32] [33], Urban100 [34] and Manga109 [35]. Set5 and Set14 has random images from animals to human faces.…”

Section: Prototypingmentioning

confidence: 99%

Deep Learning Based Single Image Super Resolution

Vaghela¹

2020

Preprint

View full text Add to dashboard Cite

show abstract

Section: Prototypingmentioning

confidence: 99%

Deep Learning Based Single Image Super Resolution

Vaghela¹

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The images were originally acquired for a project on generic shape matching and recognition. 18 Manga109 [82] A publicly available dataset of 109 Japanese comic books with numerous comic sketches.…”

Section: Live [81]mentioning

confidence: 99%

Multimedia super-resolution via deep learning: A survey

Hayat

2018

Digital Signal Processing

View full text Add to dashboard Cite

The recent phenomenal interest in convolutional neural networks (CNNs) must have made it inevitable for the super-resolution (SR) community to explore its potential. The response has been immense and in the last three years, since the advent of the pioneering work, there appeared too many works not to warrant a comprehensive survey. This paper surveys the SR literature in the context of deep learning. We focus on the three important aspects of multimedia -namely image, video and multi-dimensions, especially depth maps. In each case, first relevant benchmarks are introduced in the form of datasets and state of the art SR methods, excluding deep learning. Next is a detailed analysis of the individual works, each including a short description of the method and a critique of the results with special reference to the benchmarking done. This is followed by minimum overall benchmarking in the form of comparison on some common dataset, while relying on the results reported in various works. Fig. 1: Backpropagation (after [7]).acceptable level of convergence whereby the optimized parameters should ideally classify each subsequent test case correctly. The birth of Convolutional neural networks (CNN) or ConvNets can be traced back to 1988 [15] 1 wherein backpropagation was employed to train a NN to classify handwritten digits. Subsequent works by LeCun evolved into what was later known as LeNet5 [17]. After that there's virtual lull till late noughties [18] when GPUs were efficient enough to culminate in the work [19]. Since then a floodgate has opened and we hear of various architectures in the form of AlexNet [20], ZFNet [21], GoogLeNet [22] DenseNet [23] etc.; for a detailed overview one can consult [18], [24].The metamorphosis from fully connected NN to locally connected NN to CNN is illustrated in Fig. 2. As can be seen, rather than being fully connected, the CNN employs convolutions leading to local connections, where each local region of the input is connected to a neuron in the output. The input to a CNN is in the form of multiple arrays, such as a color image with three 2D arrays (length × width) in accordance to RGB or YCbCr channels. The number of channels is called depth and constitutes the 3rd D; note that more than three channels are not uncommon, e.g with hyperspectral images. A CNN is made up of Layers with each layer transforming an input 3D volume to an output 3D volume [11], typically, via four distinct operations [25], viz. convolution, a non-linear activation function (ReLU), pooling or sub-sampling and classification (fully connected Layer). A simplified CNN is illustrated in Fig. 3 2 . A CNN can be described as several convolution layers with nonlinear activation functions (e.g. ReLU or sigmoid) applied to each layer. Each convolution layer applies several (may be thousands) distinct filters 3 (also called feature maps) and combines their results. These filters are automatically learnt during the training part based on the task in hand, e.g. if the task is image classification the learning concerns, a...

show abstract

“…For example, given an album A, we use all the annotated images of this album from the eBDtheque dataset as testing set and collect other unannotated images of this album from other sources to build the training set. We selected the eBDtheque dataset because it provides text transcription for all images and from diverse writing styles (more representative than Manga109 dataset [3]). This dataset is composed by one hundred images containing 4691 annotated text lines.…”

Section: Datasetmentioning

confidence: 99%

Segmentation-Free Speech Text Recognition for Comic Books

Rigaud

Burie

Ogier

2017

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Speech text in comic books is written in a particular manner by the scriptwriter which raises unusual challenges for text recognition. We first detail these challenges and present different approaches to solve them. We compare the performances of pre-trained OCR and segmentation-free approach for speech text of comic books written in Latin script. We demonstrate that few good quality pre-trained OCR output samples, associated with other unlabeled data with the same writing style, can feed a segmentation-free OCR and improve text recognition. Thanks to the help of the lexicality measure that automatically accept or reject the pretrained OCR output as pseudo ground truth for a subsequent segmentation-free OCR training and recognition.

show abstract

Manga109 dataset and creation of metadata

Cited by 113 publications

References 3 publications

Deep Learning Based Single Image Super Resolution

Deep Learning Based Single Image Super Resolution

Multimedia super-resolution via deep learning: A survey

Segmentation-Free Speech Text Recognition for Comic Books

Contact Info

Product

Resources

About