2018 3rd International Conference on Pattern Analysis and Intelligent Systems (PAIS) 2018
DOI: 10.1109/pais.2018.8598524
|View full text |Cite
|
Sign up to set email alerts
|

A New Multi Varied Arabic Corpus

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…When scraping Arabic sites, text encodings must be in UTF-8 for the text to be processed by NLP. This also accounts for the Arabic text direction, from right to left, and proper encoding ensures that this feature is recognized (Meskaldji et al, 2018). Several technical issues are that, 1) Arabic sites store limited data due to high database costs; 2) Security features on many Arabic sites can hinder scraping efforts.…”
Section: Corpora Buildingmentioning
confidence: 99%
“…When scraping Arabic sites, text encodings must be in UTF-8 for the text to be processed by NLP. This also accounts for the Arabic text direction, from right to left, and proper encoding ensures that this feature is recognized (Meskaldji et al, 2018). Several technical issues are that, 1) Arabic sites store limited data due to high database costs; 2) Security features on many Arabic sites can hinder scraping efforts.…”
Section: Corpora Buildingmentioning
confidence: 99%
“…This corpus can be used for the text classification process. In addition, Arabic Text Corpus [35] is an Arabic text corpus with more than 233k words built from three different sources: Quranic text, Classical Arabic text, and Modern Arabic text. The corpus was collected from the Quran, contemporary Arabic corpora, and the InAra Arabic corpus.…”
Section: Literature Reviewmentioning
confidence: 99%