Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities An 2019
DOI: 10.18653/v1/w19-2516
|View full text |Cite
|
Sign up to set email alerts
|

Sign Clustering and Topic Extraction in Proto-Elamite

Abstract: We describe a first attempt at using techniques from computational linguistics to analyze the undeciphered proto-Elamite script. Using hierarchical clustering, n-gram frequencies, and LDA topic models, we both replicate results obtained by manual decipherment and reveal previously-unobserved relationships between signs. This demonstrates the utility of these techniques as an aid to manual decipherment.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…One such script is Proto-Elamite, which has been researched with computational methods in various articles. In Born et al (2019) the authors propose a hierarchical clustering and a topic modelling approach for Proto-Elamite and manage to produce results that were already obtained manually, as well as new findings. In order to perform a clustering of the signs, the authors use the transcriptions of the signs and propose three different techniques:…”
Section: State-of-the-art In Computational Paleographymentioning
confidence: 99%
“…One such script is Proto-Elamite, which has been researched with computational methods in various articles. In Born et al (2019) the authors propose a hierarchical clustering and a topic modelling approach for Proto-Elamite and manage to produce results that were already obtained manually, as well as new findings. In order to perform a clustering of the signs, the authors use the transcriptions of the signs and propose three different techniques:…”
Section: State-of-the-art In Computational Paleographymentioning
confidence: 99%
“…Hierarchical clustering, n-gram frequencies and latent dirichlet allocation (LDA) topic models were employed. Results were achieved by revealing previously-unobserved relationships of signs and manual deciphering [6]. Here, clustering different sign letters were provided.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Such similarly functioning signs might obtain similar embeddings, but retaining their distinction in the published transliterations still improves our understanding of the texts. However for both manual and machine-learning analysis, significant reductions in the sign list may open new avenues for decipherment: for instance, Born et al (2019) note that frequency-based approaches to decipherment are currently difficult in PE owing to the very small number of repeated n-grams in the corpus.…”
Section: Below)mentioning
confidence: 99%
“…Their task benefits from the existence of supervised Sumerian training data. Born et al (2019) train topic models on PE texts and cluster PE signs in a simple mutual information-based embedding model. The present work considers more sophisticated embedding models and performs a more detailed investigation of the embedding space.…”
Section: Below)mentioning
confidence: 99%
See 1 more Smart Citation