Evaluation of Morphological Embeddings for the Russian Language

Romanov, Vitaly; Khusainova, Albina

doi:10.1145/3342827.3342846

Cited by 4 publications

(2 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This fine-tuning allowed to not only preserve the features of both corpora but to also use available pre-trained models with the minimum changes required. Even though multiple experiments with the various types of word embeddings for inflected and agglutinative languages (Üstün et al, 2018;Romanov and Khusainova, 2019) have shown that morphological subword embeddings perform very well with the Slavic languages, these types of distributed word representations are still limited due to its static nature i.e., inability to change depending on the context. However, more information extraction opportunities occurred with the introduction of the most modern type of deep contextualized word embedding, such as ELMo (Peters et al, 2018), BERT (Devlin et al, 2019).…”

Section: Techniquesmentioning

confidence: 99%

Bilingual Terminology Extraction Using Neural Word Embeddings on Comparable Corpora

Filippova¹,

Can²,

Pastor³

2021

Proceedings of the Student Research Workshop Associated With RANLP 2021

View full text Add to dashboard Cite

Term and glossary management are vital steps of preparation of every language specialist, and they play a very important role at the stage of education of translation professionals. The growing trend of efficient time management and constant time constraints we may observe in every job sector increases the necessity of the automatic glossary compilation. Many well-performing bilingual AET systems are based on processing parallel data, however, such parallel corpora are not always available for a specific domain or a language pair. Domain-specific, bilingual access to information and its retrieval based on comparable corpora is a very promising area of research that requires a detailed analysis of both available data sources and the possible extraction techniques. This work focuses on domainspecific automatic terminology extraction from comparable corpora for the English -Russian language pair by utilizing neural word embeddings.

show abstract

Section: Techniquesmentioning

confidence: 99%

Bilingual Terminology Extraction Using Neural Word Embeddings on Comparable Corpora

Filippova¹,

Can²,

Pastor³

2021

Proceedings of the Student Research Workshop Associated With RANLP 2021

View full text Add to dashboard Cite

show abstract

“…Moreover, the mentioned language processing tasks affect the performance of ML algorithms [25][26][27]. In this context, BERT may seem to be a solution to comprehend contextual meaning of morphologically complicated words [28][29][30] without mentioned language processing tasks. This is one of the first motivations of this study.…”

Section: Introductionmentioning

confidence: 99%

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

et al. 2021

View full text Add to dashboard Cite

Language model pre-training architectures have demonstrated to be useful to learn language representations. bidirectional encoder representations from transformers (BERT), a recent deep bidirectional self-attention representation from unlabelled text, has achieved remarkable results in many natural language processing (NLP) tasks with fine-tuning. In this paper, we want to demonstrate the efficiency of BERT for a morphologically rich language, Turkish. Traditionally morphologically difficult languages require dense language pre-processing steps in order to model the data to be suitable for machine learning (ML) algorithms. In particular, tokenization, lemmatization or stemming and feature engineering tasks are needed to obtain an efficient data model to overcome data sparsity or high-dimension problems. In this context, we selected five various Turkish NLP research problems as sentiment analysis, cyberbullying identification, text classification, emotion recognition and spam detection from the literature. We then compared the empirical performance of BERT with the baseline ML algorithms. Finally, we found enhanced results compared to base ML algorithms in the selected NLP problems while eliminating heavy pre-processing tasks.

show abstract

FastText Word Embedding Model in Aspect-Level Sentiment Analysis of Airline Customer Reviews for Agglutinative Languages: A Case Study for Turkish

Özçift

2023

Engineering Cyber-Physical Systems and Critical Infrastructures

View full text Add to dashboard Cite

Evaluation of Morphological Embeddings for the Russian Language

Cited by 4 publications

References 23 publications

Bilingual Terminology Extraction Using Neural Word Embeddings on Comparable Corpora

Bilingual Terminology Extraction Using Neural Word Embeddings on Comparable Corpora

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

FastText Word Embedding Model in Aspect-Level Sentiment Analysis of Airline Customer Reviews for Agglutinative Languages: A Case Study for Turkish

Contact Info

Product

Resources

About