Multiword Expression Processing: A Survey

Constant, Mathieu; Eryiğit, Gülşen; Monti, Johanna; Plas, Lonneke van der; Ramisch, Carlos; Rosner, Michael; Todiraşcu, Amalia

doi:10.1162/coli_a_00302

Cited by 153 publications

(132 citation statements)

References 109 publications

Supporting

Mentioning

126

Contrasting

Unclassified

Order By: Relevance

“…verb subcategorization, TAG or FrameNet), main difference can be phrased as follows. Firstly, our model puts great emphasis on filled places, and accordingly on complex proper VCCs (which have filled places and possibly free places as well), connecting our approach to multiword expression processing [2]. Secondly, the aim of our model is to represent not just one VCC but all VCCs of a corpus together including their relationships to each other, in order to be able to tackle proper VCCs based on this combined model.…”

Section: Model For a Whole Corpus: The Corpus Latticementioning

confidence: 99%

A Lattice Based Algebraic Model for Verb Centered Constructions

Sass

2018

Text, Speech, and Dialogue

View full text Add to dashboard Cite

In this paper we present a new, abstract, mathematical model for verb centered constructions (VCCs). After defining the concept of VCC we introduce proper VCCs which are roughly the ones to be included in dictionaries. First, we build a simple model for one VCC utilizing lattice theory, and then a more complex model for all the VCCs of a whole corpus combining representations of single VCCs in a certain way. We hope that this model will stimulate a new way of thinking about VCCs and will also be a solid foundation for developing new algorithms handling them.

show abstract

Section: Model For a Whole Corpus: The Corpus Latticementioning

confidence: 99%

A Lattice Based Algebraic Model for Verb Centered Constructions

Sass

2018

Text, Speech, and Dialogue

View full text Add to dashboard Cite

show abstract

“…Many errors were caused by typos: even the trivial lack of a space between two words may prevent the tokenizer from correctly recognizing the terms involved in the linguistic expression, and tools could be adopted that have been designed for the interactive correction and semantic annotation, also with special focus on narrative clinical reports [32,33]. Additionally, one desideratum would be individuating multiword expressions such as 'neck of the bottle' or 'lacerated bruised wound' that need to be handled as a whole (and that, conversely, cannot be dealt with in a token by token mode) [34]. Unfortunately, in the considered domain and for the considered text excerpts, standard approaches such as mwetoolkit [35] are frequently mislead to such an extent that their adoption does not ensure substantial processing advantage.…”

Section: Discussionmentioning

confidence: 99%

ER documents categorization and explanation

Mensa

Colla

Dalmasso

et al. 2020

Preprint

View full text Add to dashboard Cite

Background Emergency room reports are a specific kind of text, posing specific challenges to natural language processing techniques. In this setting, violence episodes on women, elderly and children are often under-reported. Categorizing textual descriptions as containing violence-related injuries vs. non-violence-related injuries, is thus a relevant task, to the ends of devising alerting mechanisms to track violence episodes. Methods We present a system to detect episodes of violence from the textual descriptions contained in emergency room reports. It employs a deep neural network for categorizing textual ER reports data. Additionally, the system complements such output by making explicit which elements corroborate the interpretation of the record as reporting about violence-related injuries. To these ends we designed a novel hybrid technique for filling semantic frames that employs distributed representations of the terms herein, along with syntactic and semantic information. Results We tested our system on a set of real data of emergency room reports, coming from an Italian branch of the EU-Injury Database (EU-IDB) project, annotated by hospital staff. Our experimentation shows that the system produces accurate categorization (of violent vs. non violent records), paired with interesting results on the explanation of such output. At times, it also allowed unveiling annotation errors committed by hospital staff. Conclusions In the last few years deep architectures and word embeddings have been compared to a tsunami hitting AI and the area concerned with natural language processing. Only at a later time we have been realizing that the stunning output of deep networks needed to be explained: our proposal, combining distributed and symbolic (frame-like) representations are a possible answer to this pressing request for interpretability. Although the present application is focused on the medical domain, the proposed methodology is general and, in principle, it can be extended to further application areas and categorization tasks.

show abstract

“…They tested different network architectures, e.g., a layered feed-forward network and a recurrent neural network, and all of them outperformed more traditional MWE identification methods. The approaches based on deep learning have the advantage that they can easily leverage pre-trained word vectors as features (Constant et al, 2017;Taslimipoor and Rohanian, 2018;Ehren et al, 2018). The method described in this work also relies on pretrained word vectors.…”

Section: Related Workmentioning

confidence: 99%

“…One of the main tasks that constitute MWE processing is the automatic identification of MWEs in running text which can be used as a preprocessing step for parsing or machine translation. MWE identification can be seen as a sequence labeling task similar to named entity recognition (NER): A system receives sequences of tokens as input and outputs the same sequences with annotation labels added to it (Constant et al, 2017). As in NER, most parts of the sequence will belong to the negative class, that is, the majority of words is not part of an MWE.…”

Section: Introductionmentioning

confidence: 99%

A Neural Graph-based Approach to Verbal MWE Identification

Waszczuk

Ehren²,

Stodden

et al. 2019

Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

View full text Add to dashboard Cite

We propose to tackle the problem of verbal multiword expression (VMWE) identification using a neural graph parsing-based approach. Our solution involves encoding VMWE annotations as labellings of dependency trees and, subsequently, applying a neural network to model the probabilities of different labellings. This strategy can be particularly effective when applied to discontinuous VMWEs and, thanks to dense, pre-trained word vector representations, VMWEs unseen during training. Evaluation of our approach on three PARSEME datasets (German, French, and Polish) shows that it allows to achieve performance on par with the previous state-ofthe-art (Al Saied et al., 2018).

show abstract

Multiword Expression Processing: A Survey

Cited by 153 publications

References 109 publications

A Lattice Based Algebraic Model for Verb Centered Constructions

A Lattice Based Algebraic Model for Verb Centered Constructions

ER documents categorization and explanation

A Neural Graph-based Approach to Verbal MWE Identification

Contact Info

Product

Resources

About