Classifying heterogeneous visually rich documents is a challenging task. Difficulty of this task increases even more if the maximum allowed inference turnaround time is constrained by a threshold. The increased overhead in inference cost, compared to the limited gain in classification capabilities make current multi-scale approaches infeasible in such scenarios. There are two major contributions of this work. First, we propose a spatial pyramid model to extract highly discriminative multi-scale feature descriptors from a visually rich document by leveraging the inherent hierarchy of its layout. Second, we propose a deterministic routing scheme for accelerating end-to-end inference by utilizing the spatial pyramid model. A depth-wise separable multi-column convolutional network is developed to enable our method. We evaluated the proposed approach on four publicly available, benchmark datasets of visually rich documents. Results suggest that our proposed approach demonstrates robust performance compared to the state-of-the-art methods in both classification accuracy and total inference turnaround.
Convolutional neural networks(CNNs) has become one of the primary algorithms for various computer vision tasks. Handwritten character recognition is a typical example of such task that has also attracted attention. CNN architectures such as LeNet and AlexNet have become very prominent over the last two decades however the spatial invariance of the different kernels has been a prominent issue till now. With the introduction of capsule networks, kernels can work together in consensus with one another with the help of dynamic routing, that combines individual opinions of multiple groups of kernels called capsules to employ equivariance among kernels. In the current work, we have implemented capsule network on handwritten Indic digits and character datasets to show its superiority over networks like LeNet. Furthermore, it has also been shown that they can boost the performance of other networks like LeNet and AlexNet.
Modern deep learning algorithms have triggered various image segmentation approaches. However most of them deal with pixel based segmentation. However, superpixels provide a certain degree of contextual information while reducing computation cost. In our approach, we have performed superpixel level semantic segmentation considering 3 various levels as neighbours for semantic contexts. Furthermore, we have enlisted a number of ensemble approaches like max-voting and weightedaverage. We have also used the Dempster-Shafer theory of uncertainty to analyze confusion among various classes. Our method has proved to be superior to a number of different modern approaches on the same dataset.
Abstract-Identification of minimum number of local regions of a handwritten character image, containing well-defined discriminati ng features which are sufficie nt for a minima l but complete description of the charact er is a challe nging task. A new region selection techniq ue based on the idea of an enhanced Harmony Search methodology has been proposed here. The powerful framework of Harmony Search has been ut ilized to search the regio n space and detect only the most informative regions for correctly recogni zi ng the ha ndwritten character. T he proposed method has been tested on handw ritten samples of Bangla Basic, Compound a nd mixed (B asic and Compound characters) characters separately w ith SVM based classifier using a longest run based feature-set obtained from the image subregions formed by a CG based quad-tree partitioning approach. Applying this methodolog y on the above ment ioned three types of datasets, respectively 43. 75%, 12.5% and 37.5% gains ha ve been achie ved in terms of region reduction and 2.3%, 0.6% and 1.2% gains have been achieved in terms of recognitio n accuracy. The results show a sizeable reduction in the mini mal number of descriptive regions as well a signif icant increase in recognition accuracy for all the datasets using the proposed techniq ue. T hus the time and cost related to feature extraction is decreased w ithout dampening t he corresponding recognitio n accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.