As the amount of internet documents has been growing, document clustering has become practically important. This has led the interest in developing document clustering algorithms. Exploiting parallelism plays an important role in achieving fast and high quality clustering. In this paper, we propose a parallel algorithm that adopts a hierarchical document clustering approach. Our focus is to exploit the sources of parallelism to improve performance and decrease clustering time. The proposed parallel algorithm is tested using a test-bed collection of 749 documents from CACM. A multiprocessor system based on message-passing is used. Various parameters are considered for evaluating performance including average inter-cluster similarity, speedup and processors' utilization. Simulation results show that the proposed algorithm improves performance, decreases the clustering time, and increases the overall speedup while still keeping a high clustering quality. By increasing the number of processors, the clustering time decreases till a certain point where any more processors will no longer be effective. Moreover, the algorithm is applicable for different domains for other document collections.
This paper proposes an efficient model for recognizing and classifying a vehicle type. The model localizes each object in the image then identifies the vehicle type. The features of an image are extracted using the histogram oriented gradients (HOG) and ant colony optimization (ACO). A vehicle type is determined using different classifiers namely: the k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and Softmax classifiers. The model is implemented and operated on two datasets of vehicles' images as test-beds. From the comparative study, the SVM outperforms the other adopted classifiers and is also better using HOG than that using ACO. A modification is done on HOG by adding the Laplacian filter to select the most significant image features. The accuracy of the SVM classifier using modified HOG outperforms that one using the traditional HOG. The proposed model is analyzed and discussed regardless the local geometric and photometric transformations like illumination variations.
The main objective of clustering is to partition a set of objects into groups or clusters. The objects within a cluster are more similar to one another than those of the others clusters. This work analyzes, discusses and compares three clustering algorithms. The algorithms are based on partitioning, hierarchical, and swarm intelligence approaches. The three algorithms are k-means clustering, hierarchical agglomerative clustering, and ant clustering respectively. The algorithms are tested using three different datasets. Some measurable criteria are used for evaluating the performance of such algorithms. The criteria are: intra-cluster distance, intercluster distance, and clustering time. The experimental results showed that the k-means algorithm is faster and easily understandable than the other two algorithms. The k-means algorithm is not capable of determining the appropriate number of clusters and depends upon the user to identify this in advance. The ease of handling of any forms of similarity or distance is one of the advantages of the hierarchical clustering algorithm. The disadvantage involves the embedded flexibility regarding the granularity level. The ant-clustering algorithm can detect the more similar data for larger values of swarm coefficients. The performance of the ant clustering algorithm outperforms the other two algorithms. This occurs only for the better choice of the swarm parameters; otherwise the agglomerative hierarchical clustering is the best.
Information retrieval aims to find all relevant documents responding to a query from textual data. A good information retrieval system should retrieve only those documents that satisfy the user query. Although several models were developed, most of Arabic information retrieval models do not satisfy the user needs. This is because the Arabic language is more powerful and has complex morphology as well as high polysemy. This paper first investigates the most recent Arabic information retrieval model and then presents two different approaches to enhance the effectiveness of the adopted model. The main idea of the proposed approaches is to modify and/or expand the user query. The first approach expands user query by using semantics of words according to an Arabic dictionary. The second approach modifies and/or expands user query by adding some useful information from the pseudo relevance feedback. In other words, the query is modified by selecting relevant textual keywords for expanding the query and weeding out the non-related textual words. The adopted retrieval model and the two proposed approaches are implemented, tested, compared, and evaluated considering Arabic document collection. The obtained results show that the proposed approaches enhance the effectiveness of the Arabic information retrieval model by about 15% to 35%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.