2020
DOI: 10.3390/app10217939
|View full text |Cite
|
Sign up to set email alerts
|

Korean Historical Documents Analysis with Improved Dynamic Word Embedding

Abstract: Historical documents refer to records or books that provide textual information about the thoughts and consciousness of past civilisations, and therefore, they have historical significance. These documents are used as key sources for historical studies as they provide information over several historical periods. Many studies have analysed various historical documents using deep learning; however, studies that employ changes in information over time are lacking. In this study, we propose a deep-learning approac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…We note that the text pre-processing of our works is only based on preliminary syllable analysis. Recent techniques such as tokenization, word-embedding, multitask learning, and Bidirectional Encoder Representations from Transformers (BERT) [31][32][33][34] have not been prepared for old Korean characters yet. If the text tokenization or embedding is available for old Korean, one could remove input nouns such as the name of character and places, etc., and perform an intensive study based on corpus.…”
Section: Discussionmentioning
confidence: 99%
“…We note that the text pre-processing of our works is only based on preliminary syllable analysis. Recent techniques such as tokenization, word-embedding, multitask learning, and Bidirectional Encoder Representations from Transformers (BERT) [31][32][33][34] have not been prepared for old Korean characters yet. If the text tokenization or embedding is available for old Korean, one could remove input nouns such as the name of character and places, etc., and perform an intensive study based on corpus.…”
Section: Discussionmentioning
confidence: 99%
“…Text categorization algorithms have been successfully applied to Korean/French/Arabic/Tigrinya/Chinese languages for document/tweets classification (Kozlowski et al 2020 ), (Jin et al 2020 ). CNN with the CBOW model achieves an accuracy of 93.41% for classifying text in the Trigniya language (Fesseha et al 2021 ).…”
Section: Review On Text Analytics Word Embedding Application and Deep...mentioning
confidence: 99%
“… Pan et al ( 2019a ) Improve text classification by transforming knowledge from one domain to another Netease and Cnews are two public Chinese text classification datasets, English text datasets Yahoo dataset SVM, LSTM TF-IDF, BOW, Word2Vec LSTM + Word2Vec achieves an accuracy of 90.07% 23. Jin et al ( 2020 ) Korean historical documents analysis Korean historical documents Dynamic word embedding approach BERT NER task achieves an F1-score of 68% 24. Fesseha et al ( 2021 ) Low-Resource Languages: Tigrinya Tigrinya news datasets CNN fastText Word2Vec(CBOW, Skip-Gram) CNN + CBOW achieves an accuracy of 93.41% 25.…”
Section: Appendix Amentioning
confidence: 99%
“…However, there has been no research attempting to propose language models in Hanja, which is a dead language in Korea but absolutely necessary to explore Korean history. Most of the studies with Hanja only shed lights on translating historical Hanja documents and use AJD as their corpus (Park et al, 2020;Jin et al, 2020;Kang et al, 2021).…”
Section: Related Workmentioning
confidence: 99%