2016
DOI: 10.1051/itmconf/20160705001
|View full text |Cite
|
Sign up to set email alerts
|

Text Clustering Algorithm Based on Random Cluster Core

Abstract: Abstract. Nowadays clustering has become a popular text mining algorithm, but the huge data can put forward higher requirements for the accuracy and performance of text mining. In view of the performance bottleneck of traditional text clustering algorithm, this paper proposes a text clustering algorithm with random features. This is a kind of clustering algorithm based on text density, at the same time using the neighboring heuristic rules, the concept of random cluster is introduced, which effectively reduces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 6 publications
(3 reference statements)
0
2
0
Order By: Relevance
“…The main reason is that the computational overhead of clustering algorithms tends to be large. When the amount of data rises to a certain extent, most clustering algorithms cannot be used, so the time complexity of most clustering algorithms needs to be considered [132]. K-means, which belongs to the partitioning clustering algorithm, is a commonly used text clustering algorithm whose disadvantage is that it cannot effectively determine the number of clusters and select the initial clustering point, and has poor performance on high dimensional data, etc.…”
Section: Text Clusteringmentioning
confidence: 99%
“…The main reason is that the computational overhead of clustering algorithms tends to be large. When the amount of data rises to a certain extent, most clustering algorithms cannot be used, so the time complexity of most clustering algorithms needs to be considered [132]. K-means, which belongs to the partitioning clustering algorithm, is a commonly used text clustering algorithm whose disadvantage is that it cannot effectively determine the number of clusters and select the initial clustering point, and has poor performance on high dimensional data, etc.…”
Section: Text Clusteringmentioning
confidence: 99%
“…In this paper, we adapt the delayed combination approach and analyze its effectiveness by comparing it to the existing, commonly used, early combination approach, inspired by Huang et al, 2016. We adapt character-level CNN or LSTM-based word encoding and recent contextualized word embedding and designed CNN-based sentence encoding using a named entity dictionary as supplementary feature encodings, in addition to the common pre-trained word embedding. We pass the pre-trained word embedding and the contextualized word embedding through the separate bidirectional LSTM blocks, respectively, and then we combine the outputs with the CNN or LSTM-based word encoding and the CNN-based sentence encoding.…”
mentioning
confidence: 99%