The Impact of Word Representations on Sequential Neural MWE Identification

Zampieri, Nicolas; Ramisch, Carlos; Damnati, Géraldine

doi:10.18653/v1/w19-5121

Cited by 4 publications

(8 citation statements)

References 16 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 1 shows the distribution of papers across the 24 languages considered by our paper sample. The reasons that lead to choosing a given corpus and/or set of languages in non-ST works are various: language diversity (Zampieri et al, 2019), corpus domain (Liu et al, 2021), and corpus quality and size (Pasquer et al, 2020b).…”

Section: Corpus Constitution and Selectionmentioning

confidence: 99%

“…6 Most papers do not explicitly mention their strategy to deal with overlapping MWEs. When mentioned, overlapping MWE annotations are either ignored (Zampieri et al, 2022a), duplicated into separate sentences (Zampieri et al, 2018), or handled by the tagging scheme (Yirmibeşoglu and Güngör, 2020).…”

Section: Corpus and Splitsmentioning

confidence: 99%

“…Analyses tended to take one of two forms: example-based analysis reporting individual instances where the model performed better or worse than usual (Klyueva et al, 2017;Walsh et al, 2022), and automatic metrics aggregated across particular properties or phenomena. Among the focused metrics, some papers pay special attention to discontinuities (Björne and Salakoski, 2016;Moreau et al, 2018;Berk et al, 2018a;Rohanian et al, 2019) and seen/unseen MWEs (Maldonado et al, 2017;Zampieri et al, 2018;Taslimipoor and Rohanian, 2018). Some studies analyse the model's features and modules via ablation experiments (Scherbakov et al, 2016;Tang et al, 2016;Stodden et al, 2018;Pasquer et al, 2020a).…”

Section: Error Analysismentioning

confidence: 99%

“…Crosslanguage performance was also discussed, especially in the first editions of PARSEME (Simkó et al, 2017;Boros et al, 2017). More original aspects discussed less often include POS sequence patterns (Cordeiro et al, 2016;Tang et al, 2016), the use of external lexicons (Kirilin et al, 2016), syntactic dependencies between components (Pasquer et al, 2018;Moreau et al, 2018), pre-trained embedding representations (Zampieri et al, 2019), and tagging schemes, as discussed in § 4 (Zampieri et al, 2022b).…”

Section: Error Analysismentioning

confidence: 99%

See 3 more Smart Citations

A Survey of MWE Identification Experiments: The Devil is in the Details

Ramisch,

Walsh,

Blanchard

et al. 2023

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

View full text Add to dashboard Cite

Multiword expression (MWE) identification has been the focus of numerous research papers, especially in the context of the DiMSUM and PARSEME Shared Tasks (STs). This survey analyses 40 MWE identification papers with experiments on data from these STs. We look at corpus selection, pre-and post-processing, MWE encoding, evaluation metrics, statistical significance, and error analyses. We find that these aspects are usually considered minor and/or omitted in the literature. However, they may considerably impact the results and the conclusions drawn from them. Therefore, we advocate for more systematic descriptions of experimental conditions to reduce the risk of misleading conclusions drawn from poorly designed experimental setup.

show abstract

Section: Corpus Constitution and Selectionmentioning

confidence: 99%

Section: Corpus and Splitsmentioning

confidence: 99%

Section: Error Analysismentioning

confidence: 99%

Section: Error Analysismentioning

confidence: 99%

See 2 more Smart Citations

A Survey of MWE Identification Experiments: The Devil is in the Details

Ramisch,

Walsh,

Blanchard

et al. 2023

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

View full text Add to dashboard Cite

show abstract

“…Figure 1 shows the distribution of papers across the 24 languages considered by our paper sample. The reasons that lead to choosing a given corpus and/or set of languages in non-ST works are various: language diversity (Zampieri et al, 2019), corpus domain , and corpus quality and size (Pasquer et al, 2020b).…”

Section: Corpus Constitution and Selectionmentioning

confidence: 99%

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

2023

View full text Add to dashboard Cite

Lexical collocations: Explored a lot, still a lot more to explore Lexical collocations, i.e., idiosyncratic binary lexical item combinations, have been an active research topic already for a number of years. State-of-the-art neural network models report to detect and classify specific types of lexical collocations with high accuracy, which might suggest that the problem has been solved. However, a cross-type and cross-language analysis of the results of one of these models raises several relevant research questions. In the first part of my talk, I will present our recent work on the identification and classification of lexical collocations with respect to the fine-grained taxonomy of lexical functions (LFs) in English, French, Spanish and Japanese. Drawing on the outcome of this work, I will focus, in the second part of my talk, on the comparative analysis of the "LF profiles" of English and Japanese material. In particular, I will discuss (i) how the considered LFs are distributed in the given corpora; (ii) how rich the repertoires of the LF instances are in each of them; (iii) whether the contexts of the LF instances overlap; and (iv) to what extent the "profile" of an LF correlates with the accuracy of the recognition of its instances. To conclude, I will formulate the research questions that arise from this analysis.

show abstract

**Bridging the “gApp”: improving neural machine translation systems for multiword expression detection**

Hidalgo-Ternero

Pastor

2020

Yearbook of Phraseology

View full text Add to dashboard Cite

The present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.

show abstract

The Impact of Word Representations on Sequential Neural MWE Identification

Cited by 4 publications

References 16 publications

A Survey of MWE Identification Experiments: The Devil is in the Details

A Survey of MWE Identification Experiments: The Devil is in the Details

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

**Bridging the “gApp”: improving neural machine translation systems for multiword expression detection**

Contact Info

Product

Resources

About