2020
DOI: 10.3390/e22050556
|View full text |Cite
|
Sign up to set email alerts
|

Renormalization Analysis of Topic Models

Abstract: In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combinati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 39 publications
0
7
0
Order By: Relevance
“…Thus, the complex system's entropy differences can be measured to discover when an information maximum is reached. Koltcov's research considers entropy as negative information; thus, the maximum entropy corresponds to the minimum of information [14,15]. Therefore, the value of T corresponding to the smallest entropy value can be considered the "true number of topics," representing the maximum valid information generated by the topic model.…”
Section: Step 3: Renyi Entropy With Renormalization Analysismentioning
confidence: 99%
See 3 more Smart Citations
“…Thus, the complex system's entropy differences can be measured to discover when an information maximum is reached. Koltcov's research considers entropy as negative information; thus, the maximum entropy corresponds to the minimum of information [14,15]. Therefore, the value of T corresponding to the smallest entropy value can be considered the "true number of topics," representing the maximum valid information generated by the topic model.…”
Section: Step 3: Renyi Entropy With Renormalization Analysismentioning
confidence: 99%
“…Based on Koltcovs' research [14], the Renyi entropy can be expressed as follows: where q = 1∕T is called the deformation parameter and T is the number of topics. Z q is the partition function of a topic solution which is shown as below:…”
Section: Step 3: Renyi Entropy With Renormalization Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…From a text mining perspective, topics in a text corpus can be viewed as probability distributions of terms present in the corpus or clusters that define weights for those terms [1,2]. The LDA model (Latent Dirichlet Allocation) uses a probability distribution model to generate topics [3,4]. The principle is that a document is assumed to be generated from multiple topics according to a certain random probability distribution, and each topic is composed of words according to a random probability distribution.…”
Section: Introductionmentioning
confidence: 99%