“…These also include lexical variants corresponding to archaisms, neologisms, as well as dialectal forms or terminology of a specific domain. We report below, by way of example, some cases recorded in the Voci della Grande Guerra corpus, which collects texts of different genres and linguistic registers from the period of the First World War (De Felice et al 2018): obsolete forms rarely used in contemporary Italian (e.g., costì, tardanza); literary forms, such as pelago and nocumento; variants of current forms and/or lemmas, such as comperare for comprare, spedale for ospedale; diatopically marked forms, typical of a regional variety of Italian like cocuzza or mencio, or dialectal forms like batajun or preive. In addition to these, there are graphical variants of contemporary forms (such as pei for per i, pur troppo for purtroppo) that also have an impact on sentence segmentation.…”