Sign Clustering and Topic Extraction in Proto-Elamite

Born, Logan; Kelley, Kate; Kambhatla, Nishant; Chen, Carolyn; Sarkar, Anoop

doi:10.18653/v1/w19-2516

Cited by 4 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One such script is Proto-Elamite, which has been researched with computational methods in various articles. In Born et al (2019) the authors propose a hierarchical clustering and a topic modelling approach for Proto-Elamite and manage to produce results that were already obtained manually, as well as new findings. In order to perform a clustering of the signs, the authors use the transcriptions of the signs and propose three different techniques:…”

Section: State-of-the-art In Computational Paleographymentioning

confidence: 99%

Computational methods for undeciphered scripts

Corazza

2024

View full text Add to dashboard Cite

The study of ancient, undeciphered scripts through computational means presents unique challenges that depend both on the nature of the problem and on the peculiarities of each writing system. This volume presents two computational approaches that were successfully applied to two writing systems from the Aegean and Cyprus; the success of these endeavors paves the way for new discoveries and methods. The first part features a discussion of the Linear A and Cypro-Minoan writing systems, as well as a background of the computational approaches used. The description of the paleographic and technical aspects is aimed at scholars of both disciplines and provides an extensive background, which is crucial to understanding the goals and methods of this study. The second part is a discussion of the experimental results, which includes a proposed decipherment of the Linear A fractions. Further, the experiments on Cypro-Minoan demonstrate that, contrary to previous hypotheses, it is a single writing system, rather than comprising three separate systems. The two experiments used completely different computational methods, since the method used to decipher Linear A is based on constraint programming, while the Cypro-Minoan experiments are based on a deep learning model. Michele Corazza is a research fellow at the University of Bologna in the field of natural language processing. After obtaining his master’s degree in computer science, he joined the WIMMICS team in INRIA Sophia Antipolis, France, working as a research engineer on the CREEP project, focused on the detection and prevention of cyberbullying online. During this collaboration he developed machine learning models that detect hate speech and cyberbullying on social networks in a multilingual setting. In 2019 he started a PhD at the University of Bologna, joining the INSCRIBE ERC project, which investigates the origin of writing. His PhD studies focused on the development of computational methods based on writing systems in the Aegean and Cyprus, in particular Linear A and Cypro-Minoan. After defending his thesis in 2023, he joined the HyperModeLex ERC project, which is focused on developing AI models to aid in the European legislative process.

show abstract

Section: State-of-the-art In Computational Paleographymentioning

confidence: 99%

Computational methods for undeciphered scripts

Corazza

2024

View full text Add to dashboard Cite

show abstract

“…Hierarchical clustering, n-gram frequencies and latent dirichlet allocation (LDA) topic models were employed. Results were achieved by revealing previously-unobserved relationships of signs and manual deciphering [6]. Here, clustering different sign letters were provided.…”

Section: Literature Reviewmentioning

confidence: 99%

Translating cuneiform symbols using artificial neural network

2021

View full text Add to dashboard Cite

Cuneiform language is an old language that was invented by the people of Sumerian nation. It is an essential language for many archeologists. Especially who are interested in studying and investigating the old nations of Iraq. Dealing with this type of language usually requires specialist to translate its symbols, which are basically forms of nail shapes. This study presents a new approach to translate the cuneiform writing by employing artificial neural network (ANN) technique. Effectively, multi-layer perceptron (MLP) neural network has been adapted for translating the Sumerian cuneiform symbol images to their corresponding English letters. This work has been successfully established and it attained 100%.

show abstract

“…Such similarly functioning signs might obtain similar embeddings, but retaining their distinction in the published transliterations still improves our understanding of the texts. However for both manual and machine-learning analysis, significant reductions in the sign list may open new avenues for decipherment: for instance, Born et al (2019) note that frequency-based approaches to decipherment are currently difficult in PE owing to the very small number of repeated n-grams in the corpus.…”

Section: Below)mentioning

confidence: 99%

“…Their task benefits from the existence of supervised Sumerian training data. Born et al (2019) train topic models on PE texts and cluster PE signs in a simple mutual information-based embedding model. The present work considers more sophisticated embedding models and performs a more detailed investigation of the embedding space.…”

Section: Below)mentioning

confidence: 99%

“…Table 1 summarizes all of the models used in this work and important hyperparameters. We train these models on the PE corpus from Born et al (2019), which is a cleaned version of texts originally published by the Cuneiform Digital Library Initiative (CDLI). This contains digitized transliterations from 1399 tablets comprising 11013 lines in total, or 33778 tokens.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Compositionality of Complex Graphemes in the Undeciphered Proto-Elamite Script using Image and Text Embedding Models

Born

Kelley²,

Monroe

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

We introduce a language modeling architecture which operates over sequences of images, or over multimodal sequences of images with associated labels. We use this architecture alongside other embedding models to investigate a category of signs called complex graphemes (CGs) in the undeciphered proto-Elamite script. We argue that CGs have meanings which are at least partly compositional, and we discover novel rules governing the construction of CGs. We find that a language model over sign images produces more interpretable results than a model over text or over sign images and text, which suggests that the names given to signs may be obscuring signals in the corpus. Our results reveal previously unknown regularities in proto-Elamite sign use that can inform future decipherment efforts, and our image-aware language model provides a novel way to abstract away from biases introduced by human annotators.

show abstract

Sign Clustering and Topic Extraction in Proto-Elamite

Cited by 4 publications

References 11 publications

Computational methods for undeciphered scripts

Computational methods for undeciphered scripts

Translating cuneiform symbols using artificial neural network

Compositionality of Complex Graphemes in the Undeciphered Proto-Elamite Script using Image and Text Embedding Models

Contact Info

Product

Resources

About