Proceedings of the 10th Workshop on Multiword Expressions (MWE) 2014
DOI: 10.3115/v1/w14-0820
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Construction of a Lexicon and a Repository of Variation Patterns for Arabic Modal Multiword Expressions

Abstract: We present an unsupervised approach to build a lexicon of Arabic Modal Multiword Expressions (AM-MWEs) and a repository of their variation patterns. These novel resources are likely to boost the automatic identification and extraction of AM-MWEs 1 .

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
2
2
2

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 5 publications
(2 reference statements)
0
5
0
Order By: Relevance
“…Probably one of the biggest categories of MWE-L formalisms would be those based on phrase grammars. We further divide this category into two smaller: (i) formalisms based on list-like or regexlike structures (Breidt et al, 1996;Alegria et al, 2004;Oflazer et al, 2004;Sailer and Trawiński, 2006;Spina, 2010;Quochi et al, 2012;Al-Sabbagh et al, 2014;Al-Haj et al, 2014;Walsh et al, 2019), component words are listed in the order in which they can appear and discontinuities are most often denoted by special symbols imposing constraints on the types of insertions allowed (either by limiting the number of insertions or the words which can be inserted); (ii) formalisms based on more expressive phrase grammars (CFGs, TAGs, LFGs, HPSGs, ...) (Grégoire, 2010;Przepiórkowski et al, 2017;Savary et al, 2018b;Dyvik et al, 2019), here component words are usually terminals appearing in grammar rules, and discontinuities are denoted by non-terminals.…”
Section: Mwe-lexicon Formalismsmentioning
confidence: 99%
“…Probably one of the biggest categories of MWE-L formalisms would be those based on phrase grammars. We further divide this category into two smaller: (i) formalisms based on list-like or regexlike structures (Breidt et al, 1996;Alegria et al, 2004;Oflazer et al, 2004;Sailer and Trawiński, 2006;Spina, 2010;Quochi et al, 2012;Al-Sabbagh et al, 2014;Al-Haj et al, 2014;Walsh et al, 2019), component words are listed in the order in which they can appear and discontinuities are most often denoted by special symbols imposing constraints on the types of insertions allowed (either by limiting the number of insertions or the words which can be inserted); (ii) formalisms based on more expressive phrase grammars (CFGs, TAGs, LFGs, HPSGs, ...) (Grégoire, 2010;Przepiórkowski et al, 2017;Savary et al, 2018b;Dyvik et al, 2019), here component words are usually terminals appearing in grammar rules, and discontinuities are denoted by non-terminals.…”
Section: Mwe-lexicon Formalismsmentioning
confidence: 99%
“…The seed sentences were translated by native speakers into their own dialects to create a parallel multi-dialectal corpus in addition to English. Cotterell and Callison-Burch (2014) extended the work of Al-Sabbagh and Girju (2010) and Zaidan and Callison-Burch (2011) to build a collection of commentaries from five Arabic newspapers and tweets that was used for automatic dialect identification. Duh and Kirchhoff (2005) used CallHome Egyptian Colloquial Arabic to build a POS tagger for Egyptian and achieved an accuracy of 69.83%.…”
Section: Introductionmentioning
confidence: 99%
“…Prior to the beginning of the interactive procedure, we highlight all event modalities in each tweet using a string-match algorithm and the lexicons from Al-Sabbagh et al (2013, 2014a. The algorithm finds all potential event modality triggers (i.e.…”
Section: Annotation Scheme: Tasks and Guidelinesmentioning
confidence: 99%
“…Tweets are harvested from the Arabic Egyptian Twitter provided that (1) each tweet has at least one trendy political English or Arabic hashtag; and (2) each tweet has at least one candidate event modality trigger from the Arabic modality lexicons (Al-Sabbagh et al 2013, 2014a. We harvest tweets from a variety of users such as newspapers, TV stations, political and humanitarian campaigns, politicians, celebrities, and ordinary people.…”
Section: Corpus Harvestingmentioning
confidence: 99%