2022
DOI: 10.1007/s11063-022-10990-8
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Various Tokenizers for Arabic Text Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…ese tokens could be single words (noun, verb, pronoun, etc.) that have been altered without regard for their meanings or relationships [56,57].…”
Section: Preprocessingmentioning
confidence: 99%
“…ese tokens could be single words (noun, verb, pronoun, etc.) that have been altered without regard for their meanings or relationships [56,57].…”
Section: Preprocessingmentioning
confidence: 99%
“…The added morphemes could be at the beginning (affixes), at the middle (infixes), or at the end (suffixes). [7], [8] A single Arabic word can relatively take the form of a whole sentence when translated to other languages. For example, ‫"فسيكفيكهم"‬ means "He will suffice you against them" in English.…”
Section: Challenges Of Arabic Irmentioning
confidence: 99%
“…Tokenization is the process of splitting text into single words [9]. One possible way to do so is to split the document sentences into a list of tokens using white spaces [7]. However, there are non-segmented languages, like Chinese, which does not have white spaces between words [10].The generated tokens might be any contiguous sequence of letters or numbers.…”
Section: Tokenizationmentioning
confidence: 99%
See 1 more Smart Citation
“…There is a considerable number of tokens and sentences, a corpus contains several word forms. It is beneficial in the linguistic analysis of a language and used in a variety of the NLP applications, such as morphology, syntax, semantics, and pragmatics [2]. There is a notable lack of resources for the machine-readable Kurdish language corpora, in both raw and annotated forms.…”
Section: Introductionmentioning
confidence: 99%