Multimedia content analysis is applied in different real-world computer vision applications, and digital images constitute a major part of multimedia data. In last few years, the complexity of multimedia contents, especially the images, has grown exponentially, and on daily basis, more than millions of images are uploaded at different archives such as Twitter, Facebook, and Instagram. To search for a relevant image from an archive is a challenging research problem for computer vision research community. Most of the search engines retrieve images on the basis of traditional text-based approaches that rely on captions and metadata. In the last two decades, extensive research is reported for content-based image retrieval (CBIR), image classification, and analysis. In CBIR and image classification-based models, high-level image visuals are represented in the form of feature vectors that consists of numerical values. The research shows that there is a significant gap between image feature representation and human visual understanding. Due to this reason, the research presented in this area is focused to reduce the semantic gap between the image feature representation and human visual understanding. In this paper, we aim to present a comprehensive review of the recent development in the area of CBIR and image representation. We analyzed the main aspects of various image retrieval and image representation models from low-level feature extraction to recent semantic deep-learning approaches. The important concepts and major research studies based on CBIR and image representation are discussed in detail, and future research directions are concluded to inspire further research in this area.
Recently, face datasets containing celebrities photos with facial makeup are growing at exponential rates, making their recognition very challenging. Existing face recognition methods rely on feature extraction and reference reranking to improve the performance. However face images with facial makeup carry inherent ambiguity due to artificial colors, shading, contouring, and varying skin tones, making recognition task more difficult. The problem becomes more confound as the makeup alters the bilateral size and symmetry of the certain face components such as eyes and lips affecting the distinctiveness of faces. The ambiguity becomes even worse when different days bring different facial makeup for celebrities owing to the context of interpersonal situations and current societal makeup trends. To cope with these artificial effects, we propose to use a deep convolutional neural network (dCNN) using augmented face dataset to extract discriminative features from face images containing synthetic makeup variations. The augmented dataset containing original face images and those with synthetic make up variations allows dCNN to learn face features in a variety of facial makeup. We also evaluate the role of partial and full makeup in face images to improve the recognition performance. The experimental results on two challenging face datasets show that the proposed approach can compete with the state of the art.
The requirement for effective image search, which motivates the use of Content-Based Image Retrieval (CBIR) and the search of similar multimedia contents on the basis of user query, remains an open research problem for computer vision applications. The application domains for Bag of Visual Words (BoVW) based image representations are object recognition, image classification and content-based image analysis. Interest point detectors are quantized in the feature space and the final histogram or image signature do not retain any detail about co-occurrences of features in the 2D image space. This spatial information is crucial, as it adversely affects the performance of an image classification-based model. The most notable contribution in this context is Spatial Pyramid Matching (SPM), which captures the absolute spatial distribution of visual words. However, SPM is sensitive to image transformations such as rotation, flipping and translation. When images are not well-aligned, SPM may lose its discriminative power. This paper introduces a novel approach to encoding the relative spatial information for histogram-based representation of the BoVW model. This is established by computing the global geometric relationship between pairs of identical visual words with respect to the centroid of an image. The proposed research is evaluated by using five different datasets. Comprehensive experiments demonstrate the robustness of the proposed image representation as compared to the state-of-the-art methods in terms of precision and recall values.
There are different applications of computer vision and digital image processing in various applied domains and automated production process. In textile industry, fabric defect detection is considered as a challenging task as the quality and the price of any textile product are dependent on the efficiency and effectiveness of the automatic defect detection. Previously, manual human efforts are applied in textile industry to detect the defects in the fabric production process. Lack of concentration, human fatigue, and time consumption are the main drawbacks associated with the manual fabric defect detection process. Applications based on computer vision and digital image processing can address the abovementioned limitations and drawbacks. Since the last two decades, various computer vision-based applications are proposed in various research articles to address these limitations. In this review article, we aim to present a detailed study about various computer vision-based approaches with application in textile industry to detect fabric defects. The proposed study presents a detailed overview of histogram-based approaches, color-based approaches, image segmentation-based approaches, frequency domain operations, texture-based defect detection, sparse feature-based operation, image morphology operations, and recent trends of deep learning. The performance evaluation criteria for automatic fabric defect detection is also presented and discussed. The drawbacks and limitations associated with the existing published research are discussed in detail, and possible future research directions are also mentioned. This research study provides comprehensive details about computer vision and digital image processing applications to detect different types of fabric defects.
Facial palsy caused by nerve damage results in loss of facial symmetry and expression. A reliable palsy grading system for large-scale applications is still missing in the literature. Although numerous approaches have been reported on facial palsy quantification and grading, most employ hand-crafted features on relatively smaller datasets which limit the classification accuracy due to non-optimal face representation. In contrast, convolutional neural networks (CNNs) automatically learn the discriminative features facilitating the accurate classification of underlying tasks. In this paper, we propose to apply a typical deep network on a large dataset to extract palsy-specific features from face images. To prevent the inherent limitation of overfitting frequently occurring in CNNs, a generative adversial network (GAN) is applied to augment the training dataset. The deeply learned features are then used to classify the palsy disease into five benchmarked grades. The experimental results show that the proposed approach offers superior palsy grading performance compared to some existing methods. Such an approach is useful for palsy grading at large scale, such as primary health care.
The classification of high-resolution satellite images is an open research problem for computer vision research community. In last few decades, the Bag of Visual Word (BoVW) model has been used for the classification of satellite images. In BoVW model, an orderless histogram of visual words without any spatial information is used as image signature. The performance of BoVW model suffers due to this orderless nature and addition of spatial clues are reported beneficial for scene and geographical classification of images. Most of the image representations that can compute image spatial information as are not invariant to rotations. A rotation invariant image representation is considered as one of the main requirement for satellite image classification. This paper presents a novel approach that computes the spatial clues for the histograms of BoVW model that is robust to the image rotations. The spatial clues are calculated by computing the histograms of orthogonal vectors. This is achieved by calculating the magnitude of orthogonal vectors between Pairs of Identical Visual Words (PIVW) relative to the geometric center of an image. The comparative analysis is performed with recently proposed research to obtain the best spatial feature representation for the satellite imagery. We evaluated the proposed research for image classification using three standard image benchmarks of remote sensing. The results and comparisons conducted to evaluate this research show that the proposed approach performs better in terms of classification accuracy for a variety of datasets based on satellite images.
The recent development in the technology has increased the complexity of image contents and demand for image classification becomes more imperative. Digital images play a vital role in many applied domains such as remote sensing, scene analysis, medical care, textile industry and crime investigation. Feature extraction and image representation is considered as an important step in scene analysis as it affects the image classification performance. Automatic classification of images is an open research problem for image analysis and pattern recognition applications. The Bag-of-Features (BoF) model is commonly used to solve image classification, object recognition and other computer vision-based problems. In BoF model, the final feature vector representation of an image contains no information about the co-occurrence of features in the 2D image space. This is considered as a limitation, as the spatial arrangement among visual words in image space contains the information that is beneficial for image representation and learning of classification model. To deal with this, researchers have proposed different image representations. Among these, the division of image-space into different geometric sub-regions for the extraction of histogram for BoF model is considered as a notable contribution for the extraction of spatial clues. Keeping this in view, we aim to explore a Hybrid Geometric Spatial Image Representation (HGSIR) that is based on the combination of histograms computed over the rectangular, triangular and circular regions of the image. Five standard image datasets are used to evaluate the performance of the proposed research. The quantitative analysis demonstrates that the proposed research outperforms the state-of-art research in terms of classification accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.