2022
DOI: 10.1287/ijds.2022.0016
|View full text |Cite
|
Sign up to set email alerts
|

HeBERT and HebEMO: A Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition

Abstract: Sentiment analysis of user-generated content (UGC) can provide valuable information across numerous domains, including marketing, psychology, and public health. Currently, there are very few Hebrew models for natural language processing in general, and for sentiment analysis in particular; indeed, it is not straightforward to develop such models because Hebrew is a morphologically rich language (MRL) with challenging characteristics. Moreover, the only available Hebrew sentiment analysis model, based on a recu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(4 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…Compared to fully fine-tuned models, adapter models only incorporate a few task-specific parameters for each new task. The BERT-based experiments were conducted on three pretrained language models, each capable of handling Hebrew text, that is, (a) XLM-RoBERTa (Conneau et al, 2019), a multilingual language model based on the RoBERTa architecture (Liu et al, 2019), (b) HeBERT (Chriqui & Yahav, 2021), a monolingual BERT model trained on Hebrew data, and (c) AlephBERT (Seker et al, 2022), another monolingual BERT-based model trained on a large Hebrew vocabulary of 52K tokens optimized via masked-token prediction. Corresponding variants of these BERT models with lightweight adapter solutions focused on a small number of task-specific parameters for training using bottleneck adapters (Houlsby et al, 2019) and mix-and-match (MAM) adapters (He et al, 2021).…”
Section: Methodsmentioning
confidence: 99%
“…Compared to fully fine-tuned models, adapter models only incorporate a few task-specific parameters for each new task. The BERT-based experiments were conducted on three pretrained language models, each capable of handling Hebrew text, that is, (a) XLM-RoBERTa (Conneau et al, 2019), a multilingual language model based on the RoBERTa architecture (Liu et al, 2019), (b) HeBERT (Chriqui & Yahav, 2021), a monolingual BERT model trained on Hebrew data, and (c) AlephBERT (Seker et al, 2022), another monolingual BERT-based model trained on a large Hebrew vocabulary of 52K tokens optimized via masked-token prediction. Corresponding variants of these BERT models with lightweight adapter solutions focused on a small number of task-specific parameters for training using bottleneck adapters (Houlsby et al, 2019) and mix-and-match (MAM) adapters (He et al, 2021).…”
Section: Methodsmentioning
confidence: 99%
“…In general, NLP models and solutions for lowresource languages are extremely limited. In Hebrew, two pre-trained language models were published, HeBERT (Chriqui and Yahav, 2021) and AlephBERT (Seker et al, 2022). We used Aleph-BERT which is freely available and was trained on a larger dataset than HeBERT and was able to outperform HeBERT on a variety of natural language tasks.…”
Section: Related Workmentioning
confidence: 99%
“…The second event is marked similarly, but replacing 1 with 2. We use four Hebrew language models: AlephBERT (Seker et al, 2022), HeBERT (Chriqui and Yahav, 2022), mBERT cased (bertbase-multilingual) (Devlin et al, 2019), and AlephBERTGimmel (Guetta et al, 2022), all having 110M parameters, obtained directly from Hugging Face's transformer library. Inspired by Soares et al ( 2019), we experiment with three sequence classification architectures:…”
Section: Trc Modelsmentioning
confidence: 99%