DOI: 10.33612/diss.131057087
|View full text |Cite
|
Sign up to set email alerts
|

A Bigger Fish to Fry: Scaling up the Automatic Understanding of Idiomatic Expressions

Abstract: It's been 4½ years since I started this PhD-trajectory, not knowing what I was getting myself into. I still don't know, but I do know it's finished! So, it is time to say thanks to those people who got me to the end.First and foremost, I'm grateful to my supervisors, Johan and Malvina. Johan, thanks for giving me the opportunity to be part of a very cool research project, but also to find my own research interests. Malvina, thanks for your optimism and always coming up with new ideas and side-projects. Thanks … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(18 citation statements)
references
References 45 publications
(84 reference statements)
0
3
0
Order By: Relevance
“…One hundred and fifty pairs of English and Italian idioms with similar meanings were selected for the study. Initially, a larger sample (=600) of English idioms was pooled from several existing resources (the MAGPIE and TInCAP corpora by Haagsma, 2020 andWagner, 2021, plus the dataset resulting from the norming study by Bulkes & Tanner, 2017). Afterwards, the corresponding Italian idioms were manually selected by exploiting online resources and corpora.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…One hundred and fifty pairs of English and Italian idioms with similar meanings were selected for the study. Initially, a larger sample (=600) of English idioms was pooled from several existing resources (the MAGPIE and TInCAP corpora by Haagsma, 2020 andWagner, 2021, plus the dataset resulting from the norming study by Bulkes & Tanner, 2017). Afterwards, the corresponding Italian idioms were manually selected by exploiting online resources and corpora.…”
Section: Methodsmentioning
confidence: 99%
“…Following Hubers et al (2019), we define Content-Based Variables ("CBVs") as variables whose assessment is based on an idiom's linguistic content. The CBVs considered are literal plausibility, decomposability and transparency; they were selected because idioms positively characterised by such features seem to be more likely to occur in ambiguous contexts (Haagsma, 2020;Vulchanova, Milburn, Vulchanov, & Baggio, 2019;Wagner, 2021).…”
Section: Content-based Variablesmentioning
confidence: 99%
“…• English: Our English idioms are sourced from the MAG-PIE (Haagsma, Bos, and Nissim 2020), IMIL (Agrawal et al 2018) OpenMWE is designed for idiom identification and includes many idiomatic and literal sentences per idiom. ID10M collects idioms from several languages but does not include their meanings.…”
Section: Source Data Collectionmentioning
confidence: 99%
“…English Magpie corpus (Haagsma et al, 2020), and the English, Portuguese and Galician Semeval 2022 task 2 corpora (Tayyar Madabushi et al, 2022). The PARSEME collection could be extended to include literal readings (Savary et al, 2019), and this was explored for German (Ehren et al, 2020).…”
Section: Corpus and Splitsmentioning
confidence: 99%
“…The task of identifying Multiword Expressions (MWEs) in texts, as defined by , can be modeled using several paradigms: syntactic parsing (Nagy T. and Vincze, 2014;Constant and Nivre, 2016), compositionality prediction of MWE candidates (Cook et al, 2008;Haagsma et al, 2020;Garcia et al, 2021), or sequence annotation (Constant et al, 2012;Schneider et al, 2014). The sequence annotation paradigm has been recently popularised by the DiMSUM shared task (Schneider et al, 2016), and by three editions of the PARSEME shared tasks (Savary et al, 2017;Ramisch et al, 2018aRamisch et al, , 2020.…”
Section: Introductionmentioning
confidence: 99%