2017
DOI: 10.11591/ijece.v7i6.pp3705-3710
|View full text |Cite
|
Sign up to set email alerts
|

Arabic Book Retrieval using Class and Book Index Based Term Weighting

Abstract: One of the most common issue in information retrieval is documents ranking. Documents ranking system collects search terms from the user and orderly retrieves documents based on the relevance. Vector space models based on TF.IDF term weighting is the most common method for this topic. In this study, we are concerned with the study of automatic retrieval of Islamic Fiqh (Law) book collection. This collection contains many books, each of which has tens to hundreds of pages. Each page of the book is treated as a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0
8

Year Published

2018
2018
2020
2020

Publication Types

Select...
6
1

Relationship

4
3

Authors

Journals

citations
Cited by 20 publications
(23 citation statements)
references
References 20 publications
0
15
0
8
Order By: Relevance
“…In the tokenization process, each documents is splitted into smaller units called token [11]. In this step, all letters are converted into lowercase and some characters like punctuation, numbers, and HTML tags are also removed [12][13]. In filtering, uninformative words are removed based on the existing stoplist by by Tala [14].…”
Section: Methodsmentioning
confidence: 99%
“…In the tokenization process, each documents is splitted into smaller units called token [11]. In this step, all letters are converted into lowercase and some characters like punctuation, numbers, and HTML tags are also removed [12][13]. In filtering, uninformative words are removed based on the existing stoplist by by Tala [14].…”
Section: Methodsmentioning
confidence: 99%
“…This process aims to separate each word to distinguish certain characters that are treated as word separators or not. The tokenizing process relies on the space character in the document as a word separator [15]. b) Filtering…”
Section: Text Preprocessingmentioning
confidence: 99%
“…Preprocessing is conducted before the main process begin. Some steps conducted in this stage including tokenization, case folding and cleaning [40][41][42][43]. In tokenization, each review is splitted into smaller units called tokens or terms [44].…”
Section: Preprocessingmentioning
confidence: 99%