2014
DOI: 10.1007/s10579-014-9274-3
|View full text |Cite
|
Sign up to set email alerts
|

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

Abstract: Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) usi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 47 publications
(27 citation statements)
references
References 60 publications
(49 reference statements)
0
27
0
Order By: Relevance
“…• A templatic pattern in the form of consonant-vowel arrangement picturing the morpho-phonetic structure of a word form completing possible phonetic, syntactic, and semantic information 6 .…”
Section: A Preliminary Note On Non-linear and Cognitive Aspects Of Thmentioning
confidence: 99%
See 2 more Smart Citations
“…• A templatic pattern in the form of consonant-vowel arrangement picturing the morpho-phonetic structure of a word form completing possible phonetic, syntactic, and semantic information 6 .…”
Section: A Preliminary Note On Non-linear and Cognitive Aspects Of Thmentioning
confidence: 99%
“…Examples include as follows: 6 In the statistical analysis of each pattern, syntactical suffixes are also considered to form multiple phonetic patterns conveying additional syntactical information at the end of a pattern.…”
Section: A Preliminary Note On Non-linear and Cognitive Aspects Of Thmentioning
confidence: 99%
See 1 more Smart Citation
“…[10] showed that the quality of Arabic workers available in Mechanical Turk was not satisfying enough for POS (Part Of Speech) tagging or grammatical case annotation. However, [11] used with satisfaction the same platform for building an Arabic corpus for the much easier task of text summarization.…”
Section: Related Workmentioning
confidence: 99%
“…The document collection we used contains 160 documents selected from four general purpose, freely available corpora: 46 documents from Arabic Newspapers Corpus (ANC) [12], 53 from Corpus of Contemporary Arabic (CCA) [13], 31 from Essex Arabic Summaries Corpus (EASC) [14], [11], and 30 from Open Source Arabic Corpora (OSAC) [15].…”
Section: A Document Collectionmentioning
confidence: 99%