Natural Scene Text Recognition Based on Encoder-Decoder Framework

Zuo, Ling-Qun; Sun, Hong-Mei; Mao, Qi-Chao; Qi, Rong; Jia, Rui-Sheng

doi:10.1109/access.2019.2916616

Cited by 54 publications

(20 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The overall classification accuracy in the article is obtained by calculating the confusion matrix from the verification sample. Many studies have demonstrated that adding texture features can improve classification accuracy [61,62]. The green line refers to classification accuracy with additional parameters such as Normalized Difference Water Index (NDWI), Radio Vegetation Index (RVI), NDVI, Enhanced Vegetation Index (EVI), and Normalized Difference Building Index (NDBI), the red line refers to classification accuracy after selection using texture features, optimal windows, climatic factors, and feature parameters ( Figure 6).…”

Section: (1) Analysis Of 33-years Preliminary Classification Resultsmentioning

confidence: 99%

Expansion of Urban Impervious Surfaces in Xining City Based on GEE and Landsat Time Series Data

et al. 2020

View full text Add to dashboard Cite

Urban expansion is often studied in large cities such as Beijing, Shanghai, and Guangzhou, while scant attention is paid to smaller cities such as Xining. However, Xining is the largest city on the Tibetan Plateau, and an important city in China's "Belt and Road Initiative". As its economy and society develops, Xining will play an increasingly important role in connecting the central and western regions. In order to quantify the impacts of rapid urbanization, it is extremely important to collect data on the time and space variations of impervious surfaces. As such, we collected Landsat long-term sequence data about Xining City from 1987-2019 using the random forest method, and then optimized the feature parameters to obtain the dataset. Our results demonstrated that the overall accuracy of land use classification in Xining city is 83.4% and that the urban impervious surface accuracy is 89.5%. Additionally, the overall accuracy improved by 2.4% after optimizing the characteristic parameters, while the urban impervious surface accuracy is 92.8%. In 27 of the 33 years we studied, the classification accuracy of impervious surfaces exceeded 90%. After correcting for the temporal consistency check, the accuracy of impervious surfaces improved by 2% compared to the original sequence. We analyzed the change of impervious surfaces in Xining based on the results of the final dataset and found that the impervious surface area of Xining increased from 55 km 2 in 1987 to 334 km 2 in 2019. Xining is a typical semi-open river valley city which shares spatial and temporal characteristics with other urban centers. The spatial and temporal characteristics of the expansion of urban spaces in the main urban area of Xining are obvious and are primarily spread around the central area toward tree branch shaped road, which help other cities located in river valleys better understand how urbanization progresses.

show abstract

Section: (1) Analysis Of 33-years Preliminary Classification Resultsmentioning

confidence: 99%

Expansion of Urban Impervious Surfaces in Xining City Based on GEE and Landsat Time Series Data

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Emerging memory-efficient deep neural network architectures are capable of storing contextual information and process the sequence of features more efficiently. In [5], CNN based encoder/decoder architecture is proposed to extract and recognize the ordered feature sequence. Text is recognized by the Bidirectional long short-term memory (Bi-LSTM) network.…”

Section: Review Of Scene Text Recognition Methodsmentioning

confidence: 99%

“…The technique is simple and cost-effective. The variants of RNN such as LSTM and BLSTM [5,6,7,10] are memory efficient and capable to store contextual information for a longer duration. In any word recognition problem, the contextual information of the previous and next character is equally important and hence BLSTM architectures are gaining more popularity.…”

Section: Review Of Scene Text Recognition Methodsmentioning

confidence: 99%

Recognition of Devanagari Scene Text Using Autoencoder CNN

Shiravale

Jayadevan²,

Sannakki³

2021

ELCVIA

View full text Add to dashboard Cite

Scene text recognition is a well-rooted research domain covering a diverse application area. Recognition of scene text is challenging due to the complex nature of scene images. Various structural characteristics of the script also influence the recognition process. Text and background segmentation is a mandatory step in the scene text recognition process. A text recognition system produces the most accurate results if the structural and contextual information is preserved by the segmentation technique. Therefore, an attempt is made here to develop a robust foreground/background segmentation(separation) technique that produces the highest recognition results. A ground-truth dataset containing Devanagari scene text images is prepared for the experimentation. An encoder-decoder convolutional neural network model is used for text/background segmentation. The model is trained with Devanagari scene text images for pixel-wise classification of text and background. The segmented text is then recognized using an existing OCR engine (Tesseract). The word and character-level recognition rates are computed and compared with other existing segmentation techniques to establish the effectiveness of the proposed technique.

show abstract

“…Causes artificial effects such as sawtooth, ringing interference; The reconstruction-based method [9] is based on a specific degradation model to provide constraints on high-resolution image reconstruction based on the observed low-resolution image sequence, and then fuses different information of the same scene to obtain high quality. The reconstruction results can better suppress the artificial effects, but also cause the loss of detailed information, and the method is complicated in operation, difficult to guarantee accuracy, and low in efficiency; With the optimization of processor performance, convenient conditions have been provided for the field of big data and artificial intelligence, and deep learning applications have become more widespread [10], [11]. Learning-based algorithms are currently hotspots in the field of super-resolution [12], the algorithm learns the mapping relationship between the high-resolution image and the low-resolution image by extracting features, and finally realizes image reconstruction.…”

Section: Introductionmentioning

confidence: 99%

Super-Resolution Reconstruction Method of Remote Sensing Image Based on Multi-Feature Fusion

Huang

Jing

2020

IEEE Access

View full text Add to dashboard Cite

The acquisition of remote sensing images is affected by imaging equipment and environmental conditions. Usually on lower performance devices, the resolution of the acquired images is also low. Among many methods, the super-resolution reconstruction method based on generative adversarial networks has obvious advantages over previous network models in reconstructing image texture details. However, it is found in experiments that not all of these reconstructed textures exist in the image itself. Aiming at the problem of whether the texture details of the reconstructed image are accurate and clear, we propose a super-resolution reconstruction method combining wavelet transform and generative adversarial network. Using wavelet multi-resolution analysis, training wavelet decomposition coefficients in the generative adversarial network can effectively improve the local detail information of the reconstructed image. Experimental results show that our method can effectively reconstruct more natural image textures and make the images more visually clear. In the remote sensing image test set, the four indicators of the algorithm, peak signal to noise ratio (PSNR), structural similarity (SSIM), Feature Similarity (FSIM) and Universal Image Quality (UIQ) are slightly better than the algorithms mentioned in the article.

show abstract

Natural Scene Text Recognition Based on Encoder-Decoder Framework

Cited by 54 publications

References 19 publications

Expansion of Urban Impervious Surfaces in Xining City Based on GEE and Landsat Time Series Data

Expansion of Urban Impervious Surfaces in Xining City Based on GEE and Landsat Time Series Data

Recognition of Devanagari Scene Text Using Autoencoder CNN

Super-Resolution Reconstruction Method of Remote Sensing Image Based on Multi-Feature Fusion

Contact Info

Product

Resources

About