Downtown Osaka Scene Text Dataset

Iwamura, Masatsugu; Matsuda, Takahiro; Morimoto, Naoko; Sato, Hitomi; Ikeda, Yuki; Kise, Koichi

doi:10.1007/978-3-319-46604-0_32

Cited by 25 publications

(16 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2 https://catalist-2021.github.io/ ReCTS-25k, CTW, and RRC-LSVT from ICDAR'19 Robust Reading Competition (RRC) [23,33,31,24]. Korean and Japanese scene-text recognition datasets involve KAIST and DOST [9,7]. Different English datasets are listed in the last row of Table 1 [30,28,20,13,16,10,17,27,3,15,14].…”

Section: Related Workmentioning

confidence: 99%

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Gunna,

Saluja,

Jawahar

2022

Preprint

View full text Add to dashboard Cite

Scene-text recognition is remarkably better in Latin languages than the non-Latin languages due to several factors like multiple fonts, simplistic vocabulary statistics, updated data generation tools, and writing systems. This paper examines the possible reasons for low accuracy by comparing English datasets with non-Latin languages. We compare various features like the size (width and height) of the word images and word length statistics. Over the last decade, generating synthetic datasets with powerful deep learning techniques has tremendously improved scene-text recognition. Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images. We discover that these factors are critical for the scene-text recognition systems. The English synthetic datasets utilize over 1400 fonts while Arabic and other non-Latin datasets utilize less than 100 fonts for data generation. Since some of these languages are a part of different regions, we garner additional fonts through a region-based search to improve the scene-text recognition models in Arabic and Devanagari. We improve the Word Recognition Rates (WRRs) on Arabic MLT-17 and MLT-19 datasets by 24.54% and 2.32% compared to previous works or baselines. We achieve WRR gains of 7.88% and 3.72% for IIIT-ILST and MLT-19 Devanagari datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Gunna,

Saluja,

Jawahar

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Arabic datasets like ARASTEC (260 images of signboards, hoardings, and advertisements) and ALIF (7k text images from TV Broadcast) also exist in the scene-text recognition community [29,32]. Korean and Japanese scene-text recognition datasets include KAIST (2, 385 images from signboards, book covers, and English and Korean characters) and DOST (32k sequential images) [7,5]. The MLT dataset available from the IC-DAR'17 RRC contains 18k scene images (around 1 − 2k images per language) in Arabic, Bangla, Chinese, English, French, German, Italian, Japanese, and Korean [15].…”

Section: Related Workmentioning

confidence: 99%

“…3 2 for the first five languages we discussed in the previous section (we notice that the last two languages also follow the similar trend). On the left, we show the frequency distribution of top-5 n-grams, (n ∈ [1,5]). On the right, we show the frequency distribution of all n-grams with n ∈ [1,5].…”

Section: Datasets and Motivationmentioning

confidence: 99%

Transfer Learning for Scene Text Recognition in Indian Languages

Gunna,

Saluja,

Jawahar

2022

Preprint

View full text Add to dashboard Cite

Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We perform experiments on the conventional CRNN model and STAR-Net to ensure generalisability. To study the effect of change in different scripts, we initially run our experiments on synthetic word images rendered using Unicode fonts. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages due to similarity in their n-gram distributions and visual features like the vowels and conjunct characters. We then study the transfer learning among six Indian languages with varying complexities in fonts and word length statistics. We also demonstrate that the learned features of the models transferred from other Indian languages are visually closer (and sometimes even better) to the individual model features than those transferred from English. We finally set new benchmarks for scene-text recognition on Hindi, Telugu, and Malayalam datasets from IIIT-ILST and Bangla dataset from MLT-17 by achieving 6%, 5%, 2%, and 23% gains in Word Recognition Rates (WRRs) compared to previous works. We further improve the MLT-17 Bangla results by plugging in a novel correction BiLSTM into our model. We additionally release a dataset of around 440 scene images containing 500 Gujarati and 2535 Tamil words. WRRs improve over the baselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and the Gujarati and Tamil datasets.

show abstract

“…The Street View Text (SVT) dataset [19] was harvested from "Google Street View" images. The Downtown Osaka Scene Text dataset consists of sequential images captured in shopping streets with an omnidirectional camera [20]. Finally, the Synthetic Word Dataset [21] [22] contains 9 million images covering English words and supports tasks in text recognition and segmentation.…”

Section: Previous Workmentioning

confidence: 99%