A Comparison of Feature-Based and Neural Scansion of Poetry

Agirrezabal, Manex; Alegria, Iñaki; Hulden, Mans

doi:10.26615/978-954-452-049-6_003

Cited by 10 publications

(18 citation statements)

References 23 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We only chose <real> when <met> doesn't match the syllable count (ca. 200 cases), likely deviating from the setup in (Agirrezabal et al, 2016(Agirrezabal et al, , 2019.…”

Section: Learning Metermentioning

confidence: 93%

“…The annotated corpora for English include: (1) The for-better-for-verse (FORB) collection 18 with around 1200 lines which was used by Agirrezabal et al (2016Agirrezabal et al ( , 2019, and (2) the 1700 lines of poetry against which prosodic 19 ( Anttila and Heuser, 2016;Algee-Hewitt et al, 2014) was evaluated (PROS). We merge these with our own (3) 1200 lines in 64 English poems (EPG64).…”

Section: Additional Data and Formatmentioning

confidence: 99%

See 1 more Smart Citation

Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

Haider

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

A prerequisite for the computational study of literature is the availability of properly digitized texts, ideally with reliable meta-data and ground-truth annotation. Poetry corpora do exist for a number of languages, but larger collections lack consistency and are encoded in various standards, while annotated corpora are typically constrained to a particular genre and/or were designed for the analysis of certain linguistic features (like rhyme). In this work, we provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models that enable robust large scale analysis.We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches. In a multi-task setup, particular beneficial task relations illustrate the inter-dependence of poetic features. A model learns foot boundaries better when jointly predicting syllable stress, aesthetic emotions and verse measures benefit from each other, and we find that caesuras are quite dependent on syntax and also integral to shaping the overall measure of the line.

show abstract

“…We only chose <real> when <met> doesn't match the syllable count (ca. 200 cases), likely deviating from the setup in (Agirrezabal et al, 2016(Agirrezabal et al, , 2019.…”

Section: Learning Metermentioning

confidence: 93%

Section: Additional Data and Formatmentioning

confidence: 99%

Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

Haider

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

show abstract

“…Accuracy (Gervas, 2000) 88.73 (Navarro-Colorado, 2017) 94.44 (Agirrezabal et al, 2017) 90.84 Rantanplan (ours) 96.23 Lastly, the PoetryLab API provides a pluggable architecture that allows for the integration of external packages developed in languages other than Python. This is the case for our named entity recognition system, HisMeTag (Platas et al, 2021), developed in Java and connected to the PoetryLab API through an internal REST API exposed via Docker.…”

Section: Methodsmentioning

confidence: 99%

“…With ever increasing corpora sizes and the popularization of distant reading techniques, the possibility to automate part of the analysis became very attractive. Although solutions exist, they are either incomplete, e.g., scansion of fixed-metre poetry (Agirrezabal et al, 2016;Navarro-Colorado, 2017;Gervas, 2000;Agirrezabal et al, 2017), not applicable to Spanish (Agirrezabal et al, 2017;Hartman, 2005), or not open or reproducible (Gervas, 2000). Moreover, disparate input and output formats, operating system requirements and dependencies, and the lack of interoperability between software packages, further complicated the limited ecosystem of tools to analyze Spanish poetry.…”

Section: Poetrylabmentioning

confidence: 99%

PoetryLab as Infrastructure for the Analysis of Spanish Poetry

Rosa¹,

Pérez²,

́andez³

et al. 2021

Linköping Electronic Conference Proceedings

View full text Add to dashboard Cite

The development of the network of ontologies of the ERC POSTDATA Project brought to light some deficiencies in terms of completeness in the currently available European poetry corpora. To tackle the issue in the realm of the Spanish poetic tradition, our approach consisted in designing a set of tools that any scholar could use to automatically enrich the analysis of Spanish poetry. The effort crystallized in the PoetryLab, an extensible open source toolkit for syllabification, scansion, enjambment detection, rhyme detection, stanza identification, and historical named entity recognition for Spanish poetry. We designed the system to be interoperable, compliant with the project ontologies, easy to use by tech-savvy and non-expert researchers, and requiring minimal maintenance and setup. Furthermore, we propose the integration of the PoetryLab as a core functionality in the tool catalog of CLARIN for Spanish poetry.

show abstract

“…However, these results are restricted to hendecasyllabic verses. Shortly after that, Agirrezabal used neural networks to predict the metrical pattern of lines of verses [27]. The model proposed was a character-based bidirectional long short term (BiLSTM) neural network with conditional random fields.…”

Section: Related Workmentioning

confidence: 99%

Automated Metric Analysis of Spanish Poetry: Two Complementary Approaches

et al. 2021

View full text Add to dashboard Cite

The automatic metric analysis (commonly referred to as scansion) of Spanish poetry is not a trivial problem since it combines the nuances of the language, the different poetic traditions related to melodic patterns, and the personal stylistic preferences and intentions of the author. In this paper, we explore two alternative algorithmic approaches tailored to different applications scenarios. The first approach, Rantanplan, is a rule-based method that consists of four Natural Language Processing modules that work together to perform scansion and other related analysis: Part of Speech tagging, syllabification, stress assignment, and metrical adjustment. The second approach, Jumper, explores the possibility of performing scansion without syllabification, with a twofold purpose: to minimize the errors propagated in different parts of the linguistic processing pipeline (including the syllabification step), and to improve the efficiency of the process. Both systems outperform the state of the art and provide either a more informative solution (suitable, for instance, for teaching purposes) or a more efficient processing (when a correct scansion is all the linguistic knowledge required, as in scholar philological studies). The combined use of both systems turns out to provide a practical tool to clean-up manual annotation errors in corpora.

show abstract

A Comparison of Feature-Based and Neural Scansion of Poetry

Cited by 10 publications

References 23 publications

Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

PoetryLab as Infrastructure for the Analysis of Spanish Poetry

Automated Metric Analysis of Spanish Poetry: Two Complementary Approaches

Contact Info

Product

Resources

About