Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 2018
DOI: 10.4000/books.aaccademia.3273
|View full text |Cite
|
Sign up to set email alerts
|

Italian in the Trenches: Linguistic Annotation and Analysis of Texts of the Great War

Abstract: The paper illustrates the design and development of a textual corpus representative of the historical variants of Italian during the Great War, which was enriched with linguistic (lemmatization and pos-tagging) and meta-linguistic annotation. The corpus, after a manual revision of the linguistic annotation, was used for specializing existing NLP tools to process historical texts with promising results.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 1 publication
0
4
0
Order By: Relevance
“…These also include lexical variants corresponding to archaisms, neologisms, as well as dialectal forms or terminology of a specific domain. We report below, by way of example, some cases recorded in the Voci della Grande Guerra corpus, which collects texts of different genres and linguistic registers from the period of the First World War (De Felice et al 2018): obsolete forms rarely used in contemporary Italian (e.g., costì, tardanza); literary forms, such as pelago and nocumento; variants of current forms and/or lemmas, such as comperare for comprare, spedale for ospedale; diatopically marked forms, typical of a regional variety of Italian like cocuzza or mencio, or dialectal forms like batajun or preive. In addition to these, there are graphical variants of contemporary forms (such as pei for per i, pur troppo for purtroppo) that also have an impact on sentence segmentation.…”
Section: Challengesmentioning
confidence: 99%
See 3 more Smart Citations
“…These also include lexical variants corresponding to archaisms, neologisms, as well as dialectal forms or terminology of a specific domain. We report below, by way of example, some cases recorded in the Voci della Grande Guerra corpus, which collects texts of different genres and linguistic registers from the period of the First World War (De Felice et al 2018): obsolete forms rarely used in contemporary Italian (e.g., costì, tardanza); literary forms, such as pelago and nocumento; variants of current forms and/or lemmas, such as comperare for comprare, spedale for ospedale; diatopically marked forms, typical of a regional variety of Italian like cocuzza or mencio, or dialectal forms like batajun or preive. In addition to these, there are graphical variants of contemporary forms (such as pei for per i, pur troppo for purtroppo) that also have an impact on sentence segmentation.…”
Section: Challengesmentioning
confidence: 99%
“…More recently, POS tagging and lemmatization adaptation experiments have been carried out by using (relatively small) manually revised historical corpora to retrain the tools trained on contemporary language, with significantly improved results. This is the case of De Felice et al (2018) for the Voci della Grande Guerra Corpus, of Montemagni (2021, 2022a) for a subset of the VoDIM corpus (see below), and of Favaro et al (2022) for the the quotations in the Grande dizionario della lingua italiana ('Great Dictionary of Italian Language', in short GDLI). Last but not least, Palmero Aprosio, Menini, and Tonelli (2022) introduce BERToldo, one of the BERT-like models, trained from scratch on historical data.…”
Section: Solutionsmentioning
confidence: 99%
See 2 more Smart Citations