2022
DOI: 10.11591/ijeecs.v25.i3.pp1501-1507
|View full text |Cite
|
Sign up to set email alerts
|

Automatic construction of generic stop words list for hausa text

Abstract: <span lang="EN-US">Stop-words are words having the highest frequencies in a document without any significant information. They are characterized by having common relations within a cluster. They are the noise of the text that are evenly distributed over a document. Removal of stop words improve the performance and accuracy of information retrieval algorithms and machine learning at large. It saves the storage space by reducing the vector space dimension, and helps in effective documents indexing. This re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 26 publications
0
3
0
Order By: Relevance
“…A Hausa stemmer [ 65 ] was used to normalize words to their stem form and stop words were removed for better scoring accuracy. A list of Hausa stop words [ 66 ] was used in this study, and punctuation, non-letters, and other special characters were removed from the input text documents. We consider the following Hausa sentence: “Abubakar ya na karatu a Jamiar UTM.” The sentence is tokenized as follows: “Abubakar,” “ya,” “na,” “karatu,” “a,” “Jamiar,” and “UTM” using a space as a separator between tokens.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…A Hausa stemmer [ 65 ] was used to normalize words to their stem form and stop words were removed for better scoring accuracy. A list of Hausa stop words [ 66 ] was used in this study, and punctuation, non-letters, and other special characters were removed from the input text documents. We consider the following Hausa sentence: “Abubakar ya na karatu a Jamiar UTM.” The sentence is tokenized as follows: “Abubakar,” “ya,” “na,” “karatu,” “a,” “Jamiar,” and “UTM” using a space as a separator between tokens.…”
Section: Methodsmentioning
confidence: 99%
“…We consider the following Hausa sentence: “Abubakar ya na karatu a Jamiar UTM.” The sentence is tokenized as follows: “Abubakar,” “ya,” “na,” “karatu,” “a,” “Jamiar,” and “UTM” using a space as a separator between tokens. The words “ya,” “na,” and “a” are stop-words according to the list [ 66 ], leaving only “Abubakar,” “karatu,” “Jamiar,” and “UTM” and the word “Jamiar” is stemmed to Jamia, according to the stemmer [ 65 ].…”
Section: Methodsmentioning
confidence: 99%
“…This high computational cost is caused by the dimensionality curse and requires larger computer memory and computational time. Furthermore, in information retrieval experiments, it has been shown that removing stopwords improves precision significantly when compared with when they are not removed [3,4]. Stopwords also play a significant role in feature extraction [5,6], topic modeling [7], classification [8], ontology construction [9], and keyword extraction [10].…”
Section: Introductionmentioning
confidence: 99%