Text extraction is the key step in the character recognition; its accuracy highly relies on the location of the text region. In this paper, we propose a new method which can find the text location automatically to solve some regional problems such as incomplete, false position or orientation deviation occurred in the low-contrast image text extraction. Firstly, we make some preprocessing for the original image, including color space transform, contrastlimited adaptive histogram equalization, Sobel edge detector, morphological method and eight neighborhood processing method (ENPM) etc., to provide some results to compare the different methods. Secondly, we use the connected component analysis (CCA) method to get several connected parts and non-connected parts, then use the morphology method and CCA again for the non-connected part to erode some noises, obtain another connected and non-connected parts. Thirdly, we compute the edge feature for all connected areas, combine Support Vector Machine (SVM) to classify the real text region, obtain the text location coordinates. Finally, we use the text region coordinate to extract the block including the text, then binarize, cluster and recognize all text information. At last, we calculate the precision rate and recall rate to evaluate the method for more than 200 images. The experiments show that the method we proposed is robust for low-contrast text images with the variations in font size and font color, different language, gloomy environment, etc.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.