2021
DOI: 10.1016/j.knosys.2021.107342
|View full text |Cite
|
Sign up to set email alerts
|

What is this Cluster about? Explaining textual clusters by extracting relevant keywords

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…Specifically, this step partitions aspects into K clusters based on their semantic similarity using Google's pretrained Word2vec model and the K-means clustering method. K-means is a widely used distance/centroid-based algorithm, where distances are determined in order to allocate a point to a cluster [53]. The K-means algorithm associates each cluster with a centroid and aims to minimise the sum of the distances between the cluster centroid and the points assigned to the cluster.…”
Section: ) Extracting and Clustering Aspectsmentioning
confidence: 99%
“…Specifically, this step partitions aspects into K clusters based on their semantic similarity using Google's pretrained Word2vec model and the K-means clustering method. K-means is a widely used distance/centroid-based algorithm, where distances are determined in order to allocate a point to a cluster [53]. The K-means algorithm associates each cluster with a centroid and aims to minimise the sum of the distances between the cluster centroid and the points assigned to the cluster.…”
Section: ) Extracting and Clustering Aspectsmentioning
confidence: 99%
“…Common approaches to convert text to vector representation include bag-of-words methods and TF-IDF (term frequency-inverse document frequency), word embedding models such as word2vec (Mikolov et al, 2013 ) and Global Vectors for Word Representation (GloVe) (Pennington et al, 2014 ), and transformer models like Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2019 ). Summarizing and interpreting output document clusters can also be difficult due to the high-dimensionality of text-based data and is an active area of research (Afzali & Kumar, 2019 ; Penta & Pal, 2021 ).…”
Section: Introductionmentioning
confidence: 99%