The traditional statistical approach in synonyms extraction is time-consuming. It is necessary to develop a new method to improve the efficiency and accuracy. This research presents a new method in synonyms extraction called Noun Based Distinctive Verbs (NBDV) that replaces the traditional tf-idf weighting scheme with a new weighting scheme called the Orbit Weighing Scheme (OWS). The OWS links the nouns to their semantic space by examining the singular verbs in each context. The new method was compared with important models in the field such as the Skip-Gram, the Continuous Bag of Words, and the GloVe model. The NBDV model was manipulated over the Arabic and English languages and the results showed 47% Recall and 51% Precision in the dictionary-based evaluation and 57.5% Precision in the human experts’ evaluation. Comparing with the synonyms extraction based on tf.idf, the NBDV obtained 11% higher recall and 10% higher precision. Regarding the efficiency, we found that on average, the synonyms extraction of a single noun requires the process of 186 verbs and in 63% of the runs; the number of singular verbs was less than 200. It is concluded that the developed method is efficient and processes the single run in linear time.
This chapter presents enhanced, effective and simple approach to text classification. The approach uses an algorithm to automatically classifying documents. The main idea of the algorithm is to select feature words from each document; those words cover all the ideas in the document. The results of this algorithm are list of the main subjects founded in the document. Also, in this chapter the effects of the Arabic text classification on Information Retrieval have been investigated. The goal was to improve the convenience and effectiveness of information access. The system evaluation was conducted in two cases based on precision/recall criteria: evaluate the system without using Arabic text classification and evaluate the system with Arabic text classification. A chain of experiments were carried out to test the algorithm using 242 Arabic abstracts From the Saudi Arabian National Computer Conference. Additionally, automatic phrase indexing was implemented. Experiments revealed that the system with text classification gives better performance than the system without text classification.
The use of the Latent Semantic Analysis (LSA) in text mining demands large space and time requirements. This paper proposes a new text extraction method that sets a framework on how to employ the statistical semantic analysis in the text extraction in an efficient way. The method uses the centrality feature and omits the segments of the text that have a high verbatim, statistical, or semantic similarity with previously processed segments. The identification of similarity is based on a new multi-layer similarity method that computes the similarity in three statistical layers, it uses the Jaccard similarity and the Vector Space Model (VSM) in the first and second layers respectively, and uses the LSA in the third layer. The multi-layer similarity restricts the use of the third layer for the segments that the first and second layers failed to estimate their similarities. Rouge tool is used in the evaluation, but because Rouge does not consider the extract's size, we supplemented it with a new evaluation strategy based on the compression rate and the ratio of the sentences intersections between the automatic and the reference extracts. Our comparisons with classical LSA and traditional statistical extractions showed that we reduced the use of the LSA procedure by 52%, and we obtained 65% reduction on the original matrix dimensions, also, we obtained remarkable accuracy results. It is concluded that the employment of the centrality feature with the proposed multi-layer framework yields a significant solution in terms of efficiency and accuracy in the field of text extraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.