Feature selection is critical in reducing the size of data and improving classifier accuracy by selecting an optimum subset of the overall features. Traditionally, each feature is given a score against a particular category (such as using Mutual Information) and the task of feature selection comes down to choosing the top k ranked features with the best average score across all categories. However, this approach has two major drawbacks. Firstly, the maximum or average score of a feature with a class might not necessarily determine its discriminating strength among samples of other classes. Secondly, most feature selection methods only use the scores to select the discriminating features from the corpus without taking into account the redundancy of information provided by the selected features. In this paper, we propose a new feature ranking score measure called the Discriminative Mutual Information (DMI) score. This score helps to select features that distinguish samples of one category against all other categories. Moreover, Non-Redundant Feature Selection (NRFS) heuristic is also proposed that explicitly takes the problem of feature redundancy into account when selecting the features set. The performance of our approach is investigated and compared with other feature selection techniques on datasets derived from high-dimensional text corpora using multiple classification algorithms. The results show that the proposed method leads to better classification micro-F1 score as compared to other state-of-the-art methods. In particular, the proposed method shows great improvement when the number of selected features are small as well as an overall higher robustness to label noise.
Ancient manuscripts are a rich source of history and civilization. Unfortunately, these documents are often affected by different age and storage related degradation which impinge on their readability and information contents. In this paper, we propose a document restoration method that removes the unwanted interfering degradation patterns from color ancient manuscripts. We exploit different color spaces to highlight the spectral differences in various layers of information usually present in these documents. At each image pixel, the spectral representations of all color spaces are stacked to form a feature vector. PCA is applied to the whole data cube to eliminate correlation of the color planes and enhance separation among the patterns. The reduced data cube, along with the pixel spatial information, is used to perform a pixel based segmentation, where each cluster represents a class of pixels that share similar color properties in the decorrelated color spaces. The interfering, unwanted classes can thus be removed by inpainting their pixels with the background texture. Assuming Gaussian distributions for the various classes, a Gaussian Mixture Model (GMM) is estimated through the Expectation Maximization (EM) algorithm from the data, and then used to find appropriate labels for each pixel. In order to preserve the original appearance of the document and reproduce the background texture, the detected degraded pixels are replaced based on Gaussian conditional simulation, according to the surrounding context. Experiments are shown on manuscripts affected by different kinds of degradations, including manuscripts from the DIBCO 2018 and 2019 publicaly available dataset. We observe that the use of a few PCA dominant components accelerates the clustering process and provides a more accurate segmentation.
Background cluttering badly affects the performance of Skin detection. In highly cluttered images, skin detection becomes more difficult and the algorithm can't differentiate between the skin and non-skin pixels. In this paper, we introduce saliency algorithm for removing the irrelevant information especially the skin like regions, in the background of the human images to tackle the background cluttering problem and improve the performance of skin detection algorithms in images with complex backgrounds. Extensive experimentation on highly cluttered and complex images shows that saliency algorithm further enhances the performance of skin detection algorithms not only in terms of false positive rate but in true positive rate, true negative, false negative rate, accuracy and precision too.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.