One format to rule them all – The emtsv pipeline for Hungarian

Indig, Balázs; Sass, Bálint; Simon, Eszter; Mittelholcz, Iván; Vadász, Noémi; Makrai, Márton

doi:10.18653/v1/w19-4018

Cited by 10 publications

(9 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The poems in raw TXT were tokenized, lemmatized, and morphologically analyzed by means of the emtsv system (Indig et al (2019) also known as E-magyar; Váradi et al ( 2018)). In addition, they were phonetically transcribed using the eSpeak synthesizer.…”

Section: Data and Annotationmentioning

confidence: 99%

Rhyme in 16th-Century Hungarian Historical Songs: A Pilot Study

Maróthy¹,

Seláf²,

Plecháč³

2022

Tackling the Toolkit: Plotting Poetry Through Computational Literary Studies

View full text Add to dashboard Cite

This article presents a computer-based stichometric analysis of 26 Hungarian historical songs from the 16th century. We explore the validity of comments made by Albert Szenci Molnár in 1607 about the poor quality and simplicity of stanza structures in the poetry of previous generations. The study shows how rhyming changed in this poetic genre between 1539 and 1598. In this respect, it is the first work to explore these changes through a quantitative analysis. We find that during the examined period, there was a marked decline in the frequency of rhymes based on the repetition of the same word. At the same time, the tendency to maintain a rhyme across multiple stanzas did not change significantly.

show abstract

Section: Data and Annotationmentioning

confidence: 99%

Rhyme in 16th-Century Hungarian Historical Songs: A Pilot Study

Maróthy¹,

Seláf²,

Plecháč³

2022

Tackling the Toolkit: Plotting Poetry Through Computational Literary Studies

View full text Add to dashboard Cite

show abstract

“…This was done as follows: the 300K subset of the 2020 Hungarian news subcorpus was downloaded from the Leipzig Corpora Collection 22 by Goldhahn, Eckart & Quasthoff (2012). Morphological and syntactic dependency analysis were performed on these sentences using the emagyar text processing system by Indig et al (2019) and Váradi et al (2018). This allowed to annotate the sentences as follows:…”

Section: Linguistic Probing Tasksmentioning

confidence: 99%

BiVaSE: A bilingual variational sentence encoder with randomly initialized Transformer layers

Nyéki

2022

ALing

View full text Add to dashboard Cite

Transformer-based NLP models have achieved state-of-the-art results in many NLP tasks including text classification and text generation. However, the layers of these models do not output any explicit representations for texts units larger than tokens (e.g. sentences), although such representations are required to perform text classification. Sentence encodings are usually obtained by applying a pooling technique during fine-tuning on a specific task. In this paper, a new sentence encoder is introduced. Relying on an autoencoder architecture, it was trained to learn sentence representations from the very beginning of its training. The model was trained on bilingual data with variational Bayesian inference. Sentence representations were evaluated in downstream and linguistic probing tasks. Although the newly introduced encoder generally performs worse than well-known Transformer-based encoders, the experiments show that it was able to learn to incorporate linguistic information in the sentence representations.

show abstract

“…The TEI XML files contain not only the text of the poems but among other types of annotations, the lemma, the part of speech and the morphosyntactic features of words as well. These grammatical annotations have been created by the program e-magyar, an NLP tool for the automatic analysis of the grammatical features of Hungarian texts (Váradi et al 2018;Indig et al 2019). The research corpus containing the texts of 23 Hungarian poets has 11,262 poems and 2,120,996 words.…”

Section: Corpus and Toolsmentioning

confidence: 99%

Studia Linguistica Hungarica 33 (2021)

Authors¹

2021

SLH

View full text Add to dashboard Cite

Studia Linguistica Hungarica was originally a yearbook of Eötvös Loránd University (ELTE), under the full title of Annales Universitatis Scientiarium Budapestiensis de Rolando Eötvös Nominatae, Sectio linguistica. It formed part of a collection of university yearbooks in various disciplines, and served the purpose of making the results of ELTE-based research in linguistics available to an international audience beyond the iron curtain. The first volume of the yearbook appeared in 1970, and a total of 26 volumes were published by 2005. From 1990, financial problems hindered year-by-year appearance. Throughout this period, Annales was edited by Prof. István Szathmári. The articles were written in a variety of languages including English, German, French, Latin, Russian, Spanish, and others. Thematically, they covered the most diverse fields of research on a wide range of languages. The journal now re-appears with a new title, new editorial and advisory boards, and under very different circumstances. Studia Linguistica Hungarica publishes peer reviewed papers with a thematic focus on Hungarian and a general theoretical and typological orientation. Contributions adopting a usage-based cognitive theoretical perspective are especially, but not exclusively, welcome. The thematic scope of the journal ranges from semantics, syntax, and phonology to pragmatics, text linguistics and stylistics, from both descriptive and historical viewpoints. A single issue is published per year, with papers written in English.

show abstract

One format to rule them all – The emtsv pipeline for Hungarian

Cited by 10 publications

References 9 publications

Rhyme in 16th-Century Hungarian Historical Songs: A Pilot Study

Rhyme in 16th-Century Hungarian Historical Songs: A Pilot Study

BiVaSE: A bilingual variational sentence encoder with randomly initialized Transformer layers

Studia Linguistica Hungarica 33 (2021)

Contact Info

Product

Resources

About