Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing 2017
DOI: 10.18653/v1/w17-1402
|View full text |Cite
|
Sign up to set email alerts
|

Clustering of Russian Adjective-Noun Constructions using Word Embeddings

Abstract: This paper presents a method of automatic construction extraction from a large corpus of Russian. The term 'construction' here means a multi-word expression in which a variable can be replaced with another word from the same semantic class, for example, a glass of [water/juice/milk]. We deal with constructions that consist of a noun and its adjective modifier. We propose a method of grouping such constructions into semantic classes via 2-step clustering of word vectors in distributional models. We compare it w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…Thus, strictly speaking our results 1 we overview all previous work on the same test set. To train the models, [13] used GoogleBooks Ngrams, [8] used an extended COHA corpus, and both [11] and [21] used a subcorpus of COHA, identical to the one used in our experiments. In fact, the setting in [11] is quite similar to our work, though our best model performance is much higher than in [11]; we will further discuss this discrepancy in Section 6.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, strictly speaking our results 1 we overview all previous work on the same test set. To train the models, [13] used GoogleBooks Ngrams, [8] used an extended COHA corpus, and both [11] and [21] used a subcorpus of COHA, identical to the one used in our experiments. In fact, the setting in [11] is quite similar to our work, though our best model performance is much higher than in [11]; we will further discuss this discrepancy in Section 6.…”
Section: Methodsmentioning
confidence: 99%
“…For clustering we used k-means with various values for k and affinity propagation [8]. Affinity propagation has been previously used for various linguistic tasks, such as word sense induction [2,21]. Affinity propagation is based on incremental graph-based algorithm, partially similar to PageRank.…”
Section: Embeddings Clusteringmentioning
confidence: 99%
“…Our second clustering algorithm, affinity propagation, has the advantage of finding the number of clusters automatically: it splits the data into exemplars and instances, exemplars being representative tokens of their instances, the non-exemplar tokens in the same cluster. As Pivovarova et al (2019) point out, 'Affinity Propagation has been previously used for several NLP tasks, including collocation clustering into semantically related classes (Kutuzov et al, 2017) and unsupervised word sense induction (Alagi c et al, 2018)'. Given that, just as in the above-cited article, we lacked a gold standard, we used standard hyperparameters 27 as available in the scikit-learn package (Pedregosa et al, 2011).…”
Section: From Words To Conceptsmentioning
confidence: 99%
“…Affinity Propagation has previously been used for various language analysis tasks, including collocation clustering into semantically related classes [Kutuzov et al, 2017] and unsupervised word sense induction [Alagić et al, 2018]. The main advantages of the method are that it detects the number of clusters automatically, and is able to produce clusters of various size.…”
Section: Clusteringmentioning
confidence: 99%