2021
DOI: 10.3390/fi13110275
|View full text |Cite
|
Sign up to set email alerts
|

Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets

Abstract: The availability of different pre-trained semantic models has enabled the quick development of machine learning components for downstream applications. However, even if texts are abundant for low-resource languages, there are very few semantic models publicly available. Most of the publicly available pre-trained models are usually built as a multilingual version of semantic models that will not fit well with the need for low-resource languages. We introduce different semantic models for Amharic, a morphologica… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 33 publications
0
6
0
Order By: Relevance
“…The success of GPT and similar models has led to the development of conversational AI models by other companies and research organizations. For instance, Google’s Bidirectional Encoder Representations from Transformers (BERT) and Facebook’s RoBERTa models (a reimplementation of BERT with some modifications to the key hyperparameters and minor embedding tweaks) were trained on even larger text datasets and achieved state-of-the-art results in a range of NLP tasks [ 9 , 10 ].…”
Section: Resultsmentioning
confidence: 99%
“…The success of GPT and similar models has led to the development of conversational AI models by other companies and research organizations. For instance, Google’s Bidirectional Encoder Representations from Transformers (BERT) and Facebook’s RoBERTa models (a reimplementation of BERT with some modifications to the key hyperparameters and minor embedding tweaks) were trained on even larger text datasets and achieved state-of-the-art results in a range of NLP tasks [ 9 , 10 ].…”
Section: Resultsmentioning
confidence: 99%
“…Segmentation of sentences essentially involves the disambiguation of end-ofsentence punctuation. For Amharic, we have used the available Python-based Amharic sentence segmentation module (pip install amseg) (Yimam et al, 2021;Belay et al, 2021).…”
Section: Data Pre-processingmentioning
confidence: 99%
“…Some research has been conducted to create pre-trained Amharic models. To mention them: fastText [37,38], word2vec [26,[38][39][40], and XLMR [41]. Some of these models were trained for cross-lingual purposes and are not usable for the needs of most NLP tasks.…”
Section: Amharic Languagementioning
confidence: 99%
“…ave been generated for many languages. Although language in Ethiopia and a large volume of text is hind in computational analysis including BERT and s been conducted to create pre-trained Amharic mod-], word2vec [26,[38][39][40], and XLMR [41]. Some of these l purposes and are not usable for the needs of most are not publicly accessible.…”
Section: Amharic Languagementioning
confidence: 99%