Script identification in natural scene images is a key pre-step for text recognition and is also an indispensable condition for automatic text understanding systems that are designed for multilanguage environments. In this paper, we present a novel framework integrating Local CNN and Global CNN both of which are based on ResNet-20 for script identification. We first obtain a lot of patches and segmented images based on the aspect ratios of the images. Subsequently, these patches and segmented images are used as inputs to Local CNN and Global CNN for training, respectively. Finally, to get the final results, the Adaboost algorithm is used to combine the results of Local CNN and Global CNN for decisionlevel fusion. Benefiting from such a strategy, Local CNN fully exploits the local features of the image, effectively revealing subtle differences among the scripts that are difficult to distinguish such as English, Greek, and Russian. Moreover, Global CNN mines the global features of the image to improve the accuracy of script identification. The experimental results demonstrate that our approach has a good performance on four public datasets.
To automatically determine the number of clusters and generate more quality clusters while clustering data samples, we propose a harmonious genetic clustering algorithm, named HGCA, which is based on harmonious mating in eugenic theory. Different from extant genetic clustering methods that only use fitness, HGCA aims to select the most suitable mate for each chromosome and takes into account chromosomes gender, age, and fitness when computing mating attractiveness. To avoid illegal mating, we design three mating prohibition schemes, i.e., no mating prohibition, mating prohibition based on lineal relativeness, and mating prohibition based on collateral relativeness, and three mating strategies, i.e., greedy eugenics-based mating strategy, eugenics-based mating strategy based on weighted bipartite matching, and eugenics-based mating strategy based on unweighted bipartite matching, for harmonious mating. In particular, a novel single-point crossover operator called variable-length-and-gender-balance crossover is devised to probabilistically guarantee the balance between population gender ratio and dynamics of chromosome lengths. We evaluate the proposed approach on real-life and artificial datasets, and the results show that our algorithm outperforms existing genetic clustering methods in terms of robustness, efficiency, and effectiveness.
The choice of image feature representation plays a crucial role in the analysis of visual information. Although vast numbers of alternative robust feature representation models have been proposed to improve the performance of different visual tasks, most existing feature representations (e.g. handcrafted features or Convolutional Neural Networks (CNN)) have a relatively limited capacity to capture the highly orientationinvariant (rotation/reversal) features. The net consequence is suboptimal visual performance. To address these problems, this study adopts a novel transformational approach, which investigates the potential of using polar feature representations. Our low level consists of a histogram of oriented gradient, which is then binned using annular spatial bin-type cells applied to the polar gradient. This gives gradient binning invariance for feature extraction. In this way, the descriptors have significantly enhanced orientation-invariant capabilities. The proposed feature representation, termed orientation-invariant histograms of oriented gradients (Oi-HOG), is capable of accurately processing visual tasks (e.g., facial expression recognition). In the context of the CNN architecture, we propose two polar convolution operations, referred to as Full Polar Convolution (FPolarConv) and Local Polar Convolution (LPolarConv), and use these to develop polar architectures for the CNN orientation-invariant representation. Experimental results show that the proposed orientation-invariant image representation, based on polar models for both handcrafted features and deep learning features, is both competitive with state-of-the-art methods and maintains a compact representation on a set of challenging benchmark image datasets. Index Terms-Rotation-invariant and reversal-invariant representation, HOG, CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.