2018
DOI: 10.48048/wjst.2019.4133
|View full text |Cite
|
Sign up to set email alerts
|

Short Text Document Clustering using Distributed Word Representation and Document Distance

Abstract: This paper presents a method for clustering short text documents, such as instant messages, SMS, or news headlines. Vocabularies in the texts are expanded using external knowledge sources and represented by a Distributed Word Representation. Clustering is done using the K-means algorithm with Word Mover's Distance as the distance metric. Experiments were done to compare the clustering quality of this method, and several leading methods, using large datasets from BBC headlines, SearchSnippets, StackExchange, an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…Similarly, Haj-Yahia et al 38 and Schopf et al 39 semantically matched text to classification labels for unsupervised text classification. Meanwhile, Kongwudhikunakorn et al 40 used word embeddings and the Word Mover’s Distance 41 to accurately cluster documents.…”
Section: Discussionmentioning
confidence: 99%
“…Similarly, Haj-Yahia et al 38 and Schopf et al 39 semantically matched text to classification labels for unsupervised text classification. Meanwhile, Kongwudhikunakorn et al 40 used word embeddings and the Word Mover’s Distance 41 to accurately cluster documents.…”
Section: Discussionmentioning
confidence: 99%
“…The results of experiments conducted on Turkish tweets by using word embeddings are compared with the results where TF-IDF representations are used. Kongwudhikunakorn and Waiyamai (2020) propose a combination of document representation, document distance measure and a document clustering method in order to improve performance in short text clustering. The method includes (1) distributed representation of words for document representation (Mikolov, Sutskever, et al, 2013;, (2) Word Mover's Distance as the document distance metric (Kusner et al, 2015), and (3) K-means algorithm for document clustering (MacQueen, 1967).…”
Section: Short Text Clustering: Recent Developments For Batch Processingmentioning
confidence: 99%