This chapter presents enhanced, effective and simple approach to text classification. The approach uses an algorithm to automatically classifying documents. The main idea of the algorithm is to select feature words from each document; those words cover all the ideas in the document. The results of this algorithm are list of the main subjects founded in the document. Also, in this chapter the effects of the Arabic text classification on Information Retrieval have been investigated. The goal was to improve the convenience and effectiveness of information access. The system evaluation was conducted in two cases based on precision/recall criteria: evaluate the system without using Arabic text classification and evaluate the system with Arabic text classification. A chain of experiments were carried out to test the algorithm using 242 Arabic abstracts From the Saudi Arabian National Computer Conference. Additionally, automatic phrase indexing was implemented. Experiments revealed that the system with text classification gives better performance than the system without text classification.
The proposed methodology employs a novel statistical integrated graph-based sentence sensitivity ranking algorithm for text document clustering. Clustering of documents is a task of grouping a document automatically into a list of meaningful clusters; in order for the documents inside a group to share the same topic. In this paper, first, a novel integrated graph-based methodology using the sentence sensitivity ranking is proposed to extract keyphrases from the documents. In the standard statistical approach, keyphrases are extracted on the basis of the sentence sensitivity ranking; and in the graph-based method, the candidate keyphrases are automatically created as graphs by applying the sentence sensitivity ranking. With the aid of the top listed keyphrases, the documents clustering are carried out by implementing the proposed sentence sensitivity ranking algorithm. The simulation results reveal that the proposed graph-based text document clustering using statistical integrated graph-based sentence sensitivity ranking algorithm obtained the best results for clustering the text documents.
Information retrieval systems utilize user feedback for generating optimal queries with respect to a particular information need. However, the methods that have been developed in IR for generating these queries do not memorize information gathered from previous search processes, and hence cannot use such information in new search processes. Thus, a new search process cannot profit from the results of the previous processes. Web Information Retrieval systems should be able to maintain results from previous search processes, thus learning from previous queries and improving overall retrieval quality. In this chapter, we are using the similarity of a new query to previously learned queries. We then expand the new query by extracting terms from documents, which have been judged as relevant to these previously learned queries. Thus, the new method uses global feedback information for query expansion in contrast to local feedback information, which has been widely used in previous work in query expansion methods. Experimentally, we compared a new query expansion method with two conventional information retrieval methods in local and global query expansion to enhance the traditional information system. From the results gathered it can be concluded that although the traditional IR system performance is high, but we notice that PRF method increases the average recall and decreases the fallout measure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.