Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP) 2014
DOI: 10.3115/v1/w14-3613
|View full text |Cite
|
Sign up to set email alerts
|

Tunisian dialect Wordnet creation and enrichment using web resources and other Wordnets

Abstract: In this paper, we propose TunDiaWN (Tunisian dialect Wordnet) a lexical resource for the dialect language spoken in Tunisia. Our TunDiaWN construction approach is founded, in one hand, on a corpus based method to analyze and extract Tunisian dialect words. A clustering technique is adapted and applied to mine the possible relations existing between the Tunisian dialect extracted words and to group them into meaningful groups. All these suggestions are then evaluated and validated by the experts to perform the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 9 publications
0
8
0
Order By: Relevance
“…Work on the construction of TD ontologies includes the Wordnet 'TunDiaWN' proposed by Bouchlaghem et al (2014). The Wordnet was constructed from a corpus named multi-source Tunisian dialect corpus (MultiTD), collected from various sources (social networks, written plays, dictionaries, transcription of speeches, etc.).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Work on the construction of TD ontologies includes the Wordnet 'TunDiaWN' proposed by Bouchlaghem et al (2014). The Wordnet was constructed from a corpus named multi-source Tunisian dialect corpus (MultiTD), collected from various sources (social networks, written plays, dictionaries, transcription of speeches, etc.).…”
Section: Related Workmentioning
confidence: 99%
“…The followed approach results in a unique form of each transcribed word and therefore cannot cover the written form of TD, especially the one that is daily produced on the social web, which is, as shown in Section 2, rich, varied and does not conform to specific rules or standards. Other TD resources were built using existing MSA resources (Boujelbane et al, 2013b;Bouchlaghem et al, 2014). Although benefiting from MSA corpora and tools helps importantly overcoming the lack of LRs, this approach does not consider TD language productions that are not MSA derived, especially when dealing with the TD used on the social web.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Cavalli‐Sforza et al () created an Iraqi Arabic WordNet using an English‐Iraqi dictionary and the modern standard Arabic version of WordNet as well as the English WordNet. Moreover, a Tunisian dialect WordNet was built in Bouchlaghem, Elkhlifi, and Faiz () starting from a Tunisian corpus.…”
Section: Native Language Language Varieties and Dialects Identificamentioning
confidence: 99%
“…In (Cavalli-Sforza et al, 2013) an Iraqi Word-Net is presented based on the MSA WordNet, the English WordNet, and an English-Iraqi dictionary. A Tunisian dialect WordNet was built in (Bouchlaghem & Elkhlifi, 2014) starting from a Tunisian corpus.…”
Section: Building Lexicons and Lexical Analysismentioning
confidence: 99%