2021
DOI: 10.3390/app11156896
|View full text |Cite
|
Sign up to set email alerts
|

Creating Welsh Language Word Embeddings

Abstract: Word embeddings are representations of words in a vector space that models semantic relationships between words by means of distance and direction. In this study, we adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language. These methods exploit the principles of distributional semantics and, therefore, require a large corpus to be trained on. However, Welsh is a minoritised language, hence … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…Existing research has focused primarily on using digital language resources for second language and foreign learning among students and children [1,28,29]. There are very few studies that focused on native and particularly minority languages, their e-learning and analysis, such as, for example, the Welsh language [34]. The characteristics of native language usage compared to second language usage were only examined within the context of Slovene speakers' preferences for communication in online activities during second language learning [28].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Existing research has focused primarily on using digital language resources for second language and foreign learning among students and children [1,28,29]. There are very few studies that focused on native and particularly minority languages, their e-learning and analysis, such as, for example, the Welsh language [34]. The characteristics of native language usage compared to second language usage were only examined within the context of Slovene speakers' preferences for communication in online activities during second language learning [28].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Each item of the set is modelled as a finite mixture over an underlying set of topic probabilities (Sathi & Ramanujapura, 2016). To derive an accurate categorization of user feedback, the authors used the document-embedding widget before topic modelling to observe the embedding for each n-gram while employing the pre-trained fastText models for English and obtaining one vector per document (Alghamdi & Alfalqi, 2015;Corcoran et al, 2021). The LDAvis and multidimensional scaling (MDS) tools were used to evaluate the topics that emerged from the LDA, the LDAvis, and MDS.…”
Section: Thematic Analysis Of Positive and Negative Reviewsmentioning
confidence: 99%
“…In tasks like predicting lexical complexity, leveraging Transformer models in conjunction with various traditional linguistic features has proven effective in enhancing the performance of deep learning systems [10]. Similarly, accommodating syntactic and morphological peculiarities is crucial, especially for languages like Welsh, a minority language, which necessitates adaptations to existing word embedding methods for optimal results [11]. Hence, the imperative lies in crafting specific word embedding methodologies tailored to the nuances of particular texts or tasks, recognizing the intricate interplay between linguistic structure and neural representations.…”
Section: Introductionmentioning
confidence: 99%