Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.317
|View full text |Cite
|
Sign up to set email alerts
|

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Abstract: Understanding voluminous historical records provides clues on the past in various aspects, such as social and political issues and even natural science facts. However, it is generally difficult to fully utilize the historical records, since most of the documents are not written in a modern language and part of the contents are damaged over time. As a result, restoring the damaged or unrecognizable parts as well as translating the records into modern languages are crucial tasks. In response, we present a multi-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 33 publications
0
9
0
Order By: Relevance
“…There is a significant perplexity difference between the ground truth nKo translation and oKo, which means the gt-nKo translation is closer to the modern language than the gt-oKo. Our generated translations show a lower perplexity than the gt-oKo and Kang et al (2021); it is closer to the modern language similar to gt-nKo.…”
Section: Absolute Evaluationmentioning
confidence: 65%
See 2 more Smart Citations
“…There is a significant perplexity difference between the ground truth nKo translation and oKo, which means the gt-nKo translation is closer to the modern language than the gt-oKo. Our generated translations show a lower perplexity than the gt-oKo and Kang et al (2021); it is closer to the modern language similar to gt-nKo.…”
Section: Absolute Evaluationmentioning
confidence: 65%
“…To translate AJD with the neural network, Park et al (2020) proposes a new subword tokenization method called the share vocabulary and entity restriction byte pair encoding. Kang et al (2021) presents a multi-task learning approach that restoring and translating the historical documents. For the restoration task, they used the untranslated Diaries of the Royal Secretariat (DRS) data, Korean historical documents written in Hanja.…”
Section: Neural Machine Translation For the Annals Of The Joseon Dynastymentioning
confidence: 99%
See 1 more Smart Citation
“…Inspired by biological neural networks, deep neural networks can discover and harness intricate statistical patterns in vast quantities of data 10 . Recent increases in computational power have enabled these models to tackle challenges of growing sophistication in many fields [11][12][13][14] , including the study of ancient languages [15][16][17][18] .…”
Section: Deep Learning For Epigraphymentioning
confidence: 99%
“…The closest work to Ithaca is our 2019 research on ancient text restoration: Pythia 15 . Pythia was to our knowledge the first ancient text restoration model to use deep neural networks, and was followed by blank language models 18 , Babylonian 65 and Korean text translation and restoration 17 , Latin BERT for language modelling, part-of-speech tagging, word sense disambiguation and word similarity 16 , and the classification of Cuneiform tablets by period 66 .…”
Section: Previous Workmentioning
confidence: 99%