This paper addresses the problem of semantic-based image retrieval of natural scenes. A typical content-based image retrieval system deals with the query image and images in the dataset as a collection of low-level features and retrieves a ranked list of images based on the similarities between features of the query image and features of images in the image dataset. However, top ranked images in the retrieved list, which have high similarities to the query image, may be different from the query image in terms of the semantic interpretation of the user which is known as the semantic gap. In order to reduce the semantic gap, this paper investigates how natural scene retrieval can be performed using the bag of visual word model and the distribution of local semantic concepts. The paper studies the efficiency of using different approaches for representing the semantic information, depicted in natural scene images, for image retrieval. An extensive experimental work has been conducted to study the efficiency of using semantic information as well as the bag of visual words model for natural and urban scene image retrieval.
The problem of image annotation has gained increasing attention from many researchers in computer vision. Few works have addressed the use of bag of visual words for scene annotation at region level. The aim of this paper is to study the relationship between the distribution of local semantic concepts and local keypoints located in image regions labelled with these semantic concepts. Based on this study, we investigate whether bag of visual words model can be used to efficiently represent the content of natural scene image regions, so images can be annotated with local semantic concepts. Also, this paper presents local from global approach which study the influence of using visual vocabularies generated from general scene categories to build bag of visual words at region level. Extensive experiments are conducted over a natural scene dataset with six categories. The reported results have shown the plausibility of using the BOW model to represent the semantic information of image regions.
Defect density is an essential software testing and maintenance aspect that determines the quality of software products. It is used as a management factor to distribute limited human resources successfully. The availability of public defect datasets facilitates building defect density prediction models using established static code metrics. Since the data gathered for software modules are often subject to uncertainty, it becomes difficult to deliver accurate and reliable predictions. To alleviate this issue, we propose a new prediction model that integrates gray system theory and fuzzy logic to handle the imprecision in software measurement. We propose a new similarity measure that combines the benefits of fuzzy logic and gray relational analysis. The proposed model was validated against defect density prediction models using public defect datasets. The defect density variable is frequently sparse because of the vast number of none‐defected modules in the datasets. Therefore, we also check our proposed model's performance against the sparsity level. The findings reveal that the developed model surpasses other defect density prediction models over the datasets with high and very high sparsity ratios. The ensemble learning techniques are competitive choices to the proposed model when the sparsity ratio is relatively small. On the other hand, the statistical regression models were the most inadequate methods for such problems and datasets. Finally, the proposed model was evaluated against different degrees of uncertainty using a sensitivity analysis procedure. The results showed that our model behaves stably under different degrees of uncertainty.
Predicting software defects is an important task during software testing phase, especially for allocating appropriate resources and prioritizing testing tasks. Typically, classification algorithms are used to accomplish this task by using previously collected datasets. However, these datasets suffer from imbalanced label distribution where clean modules outnumber defective modules. Traditional classification algorithms cannot handle this nature in defect datasets because they assume the datasets are balanced. Failing to address this problem, the classification algorithm will produce a prediction biased towards the majority label. In the literature, there are several techniques designed to address this problem and most of them focus on data re-balancing. Recently, ensemble class imbalance techniques have emerged as an opposing approach to data rebalancing approaches. Regarding the software defect prediction, there are no studies examining the performance of ensemble class imbalance learning against data re-balancing approaches. This paper investigates the efficiency of ensemble class imbalance learning for software defect prediction. We conducted a comprehensive experiment that involved 12 datasets, six classifiers, nine class imbalance techniques, and 10 evaluation metrics. The experiments showed that ensemble approaches, particularly the Under Bagging technique, outperform traditional data re-balancing approaches, particularly when dealing with datasets that have high defect ratios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.