2023
DOI: 10.48550/arxiv.2302.03126
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Context-Gloss Augmentation for Improving Arabic Target Sense Verification

Abstract: Arabic language lacks semantic datasets and sense inventories. The most common semantically-labeled dataset for Arabic is the ArabGlossBERT, a relatively small dataset that consists of 167K context-gloss pairs (about 60K positive and 107K negative pairs), collected from Arabic dictionaries. This paper presents an enrichment to the ArabGlossBERT dataset, by augmenting it using (Arabic-English-Arabic) machine back-translation. Augmentation increased the dataset size to 352K pairs (149K positive and 203K negative… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 17 publications
0
0
0
Order By: Relevance
“…While few studies have focused on augmenting Arabic data [28]- [31], [49], some have used the current DA noising and paraphrasing-based approaches without employing the transformer's powerful models as augmentation techniques [29]- [31], [50]. Other studies have employed transformers to evaluate the augmented Arabic dataset [34], [35].…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…While few studies have focused on augmenting Arabic data [28]- [31], [49], some have used the current DA noising and paraphrasing-based approaches without employing the transformer's powerful models as augmentation techniques [29]- [31], [50]. Other studies have employed transformers to evaluate the augmented Arabic dataset [34], [35].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Swapping [4], [22] Deletion [4], [9] Insertion [4], [23] Substitution [4], [24] Mixup [4], [15] Paraphrasing-based techniques Thesauruses [6], [12] Rules [13], [14] Machine translation [21] Transformers [13], [35], [47], [48] English Noising-based techniques Swapping [28], [36] Deletion [28], [49] Insertion [29], [31], [53] Substitution [30], [31], [36], [53] Mixup [30], [31], [39], [53] Arabic Paraphrasing-based techniques Thesauruses [31], [34], [40] Rules [29], [31], [35] Machine translation [50] Transformers…”
Section: Noising-based Techniquesmentioning
confidence: 99%