The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper-noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entitytype-specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.
How has the sonnet form in Spanish evolved over the centuries? What is the distribution of metrical patterns and combinations thereof, considering diachronic, geographical, and social factors? What rhyme schemes are favoured in different periods and regions? How is enjambment distributed within the sonnet? Providing quantitative answers to such questions requires a corpus spanning several centuries, annotated for the relevant literary features and containing author metadata. The absence of appropriate digital resources to undertake a macroanalytic study of the evolution of the sonnet in Spanish led us to create the Diachronic Spanish Sonnet Corpus. This article presents how the corpus was designed for providing quantitative evidence on the evolution of sonnets in Spanish, and our findings regarding metrics and enjambment. The corpus contains 4,085 sonnets by 1,204 Spanish and Latin American authors (15th to 19th centuries), encoded in TEI, with RDFa attributes. The corpus aims at breadth, including many peripheral authors besides some major ones. Author metadata were encoded (dates, origin, gender). Scansion and enjambment were annotated automatically, with the ADSO and ANJA tools. The range of authors and periods, the use of TEI and RDFa for interoperability, and the combination of metrical and enjambment annotations goes beyond previously available digital resources. The corpus allowed us to examine the evolution of metrical patterns and their combinations after the Golden Age, complementing earlier studies. We also observed an increase in enjambment across the tercets in the 19th century, which may indicate increased variety in the discourse organization of sonnets in the period.
The rise in artificial intelligence and natural language processing techniques has increased considerably in the last few decades. Historically, the focus has been primarily on texts expressed in prose form, leaving mostly aside figurative or poetic expressions of language due to their rich semantics and syntactic complexity. The creation and analysis of poetry have been commonly carried out by hand, with a few computer-assisted approaches. In the Spanish context, the promise of machine learning is starting to pan out in specific tasks such as metrical annotation and syllabification. However, there is a task that remains unexplored and underdeveloped: stanza classification. This classification of the inner structures of verses in which a poem is built upon is an especially relevant task for poetry studies since it complements the structural information of a poem. In this work, we analyzed different computational approaches to stanza classification in the Spanish poetic tradition. These approaches show that this task continues to be hard for computers systems, both based on classical machine learning approaches as well as statistical language models and cannot compete with traditional computational paradigms based on the knowledge of experts.
Enjambment takes place when a syntactic unit is broken up across two lines of poetry, giving rise to different stylistic effects. In Spanish literary studies, there are unclear points about the types of stylistic effects that can arise, and under which linguistic conditions. To systematically gather evidence about this, we developed a system to automatically identify enjambment (and its type) in Spanish. For evaluation, we manually annotated a reference corpus covering different periods. As a scholarly corpus to apply the tool, from public HTML sources we created a diachronic corpus covering four centuries of sonnets (3750 poems), and we analyzed the occurrence of enjambment across stanzaic boundaries in different periods. Besides, we found examples that highlight limitations in current definitions of enjambment.
The objective of this study was to provide the global community of interested scholars with an updated understanding of digital humanities in Spain, in terms of researchers and research centres, disciplines involved and research topics of interest, trends in digital resources development, main funding bodies and the evolution of their investment since the early nineties. One of the characteristics that differentiates this study from previous approaches is the information used to carry out the research. It combines large datasets of publicly available data from trusted sources with a handpicked selection of records grouping information scattered over the Web. Most of the evidence detected by other studies has been numerically confirmed. At the same time, the new metrics and values established constitute a reference base for monitoring the future evolution of the discipline, and thus favour comparisons. Half of the researchers were found to be affiliated to only nine institutions, whereas the other half of them were scattered across 84 locations. Department affiliation showed a varied pattern of the different degrees of specialization in each institution. Although the major historic role played by Philology was confirmed, the rising interest of other areas of the Humanities and Social Science produces a wider picture, which helped to identify five large clusters of research topics, centred on major disciplines. The quantitative analysis of funding, a dimension almost unexplored in the Humanities, proved to be a valuable way to assess the discipline and its historical evolution. In fact, it revealed interesting trends that led to our proposal of a three-phase periodization in the consolidation of digital humanities in Spain. The paper concludes with a set of recommendations regarding how to deal with issues that could harm the future development of this research area and the role that Spanish researchers can play in the international context.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.