2016
DOI: 10.1007/978-3-319-46681-1_52
|View full text |Cite
|
Sign up to set email alerts
|

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…Unsupervised algorithms mainly sort the keyword weights through some specified indicators and select keywords on the basis of the sorting results. Among them are representative TF-IDF based on statistical features [1,2] , TextRank based on word graph model [3,4] , and Latent Dirichlet Allocation (LDA) based on topic model [6] . To optimize the effect of algorithm extraction, Luo et al [7] derived the calculation formula for the number of words of the same frequency in the text through Zipf's law and then determined the proportion of each frequency word in the text by using the calculation formula for the number of words of the same frequency.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Unsupervised algorithms mainly sort the keyword weights through some specified indicators and select keywords on the basis of the sorting results. Among them are representative TF-IDF based on statistical features [1,2] , TextRank based on word graph model [3,4] , and Latent Dirichlet Allocation (LDA) based on topic model [6] . To optimize the effect of algorithm extraction, Luo et al [7] derived the calculation formula for the number of words of the same frequency in the text through Zipf's law and then determined the proportion of each frequency word in the text by using the calculation formula for the number of words of the same frequency.…”
Section: Related Workmentioning
confidence: 99%
“…The importance of words with no representativeness is reduced in high-frequency words to the text, and then the accuracy of keyword extraction is improved. TextRank [3,4] , which is based on network graph, is a classic unsupervised keyword extraction method. This method decomposes the content of a single document into a network graph model by word segmentation and extracts keywords by considering the structural features and word frequency features of the document.…”
Section: Introductionmentioning
confidence: 99%
“…Topic modeling originates from early latent semantic analysis (LSA), which aims to discover meaningful semantic structures in the corpus [18], with a focus on keyword extraction. The representative approaches are through the use of TF-IDF, which is based on statistical features [19,20], TextRank, based on word graph models [21,22], and Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA), based on topic models [23]. PLSA and LDA are the most widely used probabilistic techniques in topic modeling [24].…”
Section: Introductionmentioning
confidence: 99%