2013 ACS International Conference on Computer Systems and Applications (AICCSA) 2013
DOI: 10.1109/aiccsa.2013.6616474
|View full text |Cite
|
Sign up to set email alerts
|

Building Arabic corpora from Wikisource

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 1 publication
0
3
0
Order By: Relevance
“…We used specifically the test part of each corpus 10 . The Arabic corpus (InAra) (Bensalem et al 2013a(Bensalem et al , 2013b has been built by ourselves, following PAN annotation standards, and has been used in AraPlagDet 2015 11 , the first plagiarism detection competition on Arabic documents (Bensalem et al 2015).…”
Section: Datasets and Performance Measuresmentioning
confidence: 99%
“…We used specifically the test part of each corpus 10 . The Arabic corpus (InAra) (Bensalem et al 2013a(Bensalem et al , 2013b has been built by ourselves, following PAN annotation standards, and has been used in AraPlagDet 2015 11 , the first plagiarism detection competition on Arabic documents (Bensalem et al 2015).…”
Section: Datasets and Performance Measuresmentioning
confidence: 99%
“…Commonly applied syntax analysis tools include Penn Treebank, 3 Citar, 4 TreeTagger, 5 and Stanford parser. 6 Several papers present resources for Arabic [33,34,227] and Urdu [54] language processing.…”
Section: Preprocessingmentioning
confidence: 99%
“…We finally decided to build our corpus from Arabic Wikisource which is a library of heritage books and public domain texts. Furthermore, most of its documents are tagged with topics and author names (see our paper [19] for further details on the text compilation from Wikisource). We also added some texts from other sources, after making sure that they are without copyright.…”
Section: C1mentioning
confidence: 99%