2017
DOI: 10.1162/coli_a_00302
|View full text |Cite
|
Sign up to set email alerts
|

Multiword Expression Processing: A Survey

Abstract: Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
126
0
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 153 publications
(132 citation statements)
references
References 109 publications
1
126
0
2
Order By: Relevance
“…verb subcategorization, TAG or FrameNet), main difference can be phrased as follows. Firstly, our model puts great emphasis on filled places, and accordingly on complex proper VCCs (which have filled places and possibly free places as well), connecting our approach to multiword expression processing [2]. Secondly, the aim of our model is to represent not just one VCC but all VCCs of a corpus together including their relationships to each other, in order to be able to tackle proper VCCs based on this combined model.…”
Section: Model For a Whole Corpus: The Corpus Latticementioning
confidence: 99%
“…verb subcategorization, TAG or FrameNet), main difference can be phrased as follows. Firstly, our model puts great emphasis on filled places, and accordingly on complex proper VCCs (which have filled places and possibly free places as well), connecting our approach to multiword expression processing [2]. Secondly, the aim of our model is to represent not just one VCC but all VCCs of a corpus together including their relationships to each other, in order to be able to tackle proper VCCs based on this combined model.…”
Section: Model For a Whole Corpus: The Corpus Latticementioning
confidence: 99%
“…Many errors were caused by typos: even the trivial lack of a space between two words may prevent the tokenizer from correctly recognizing the terms involved in the linguistic expression, and tools could be adopted that have been designed for the interactive correction and semantic annotation, also with special focus on narrative clinical reports [32,33]. Additionally, one desideratum would be individuating multiword expressions such as 'neck of the bottle' or 'lacerated bruised wound' that need to be handled as a whole (and that, conversely, cannot be dealt with in a token by token mode) [34]. Unfortunately, in the considered domain and for the considered text excerpts, standard approaches such as mwetoolkit [35] are frequently mislead to such an extent that their adoption does not ensure substantial processing advantage.…”
Section: Discussionmentioning
confidence: 99%
“…They tested different network architectures, e.g., a layered feed-forward network and a recurrent neural network, and all of them outperformed more traditional MWE identification methods. The approaches based on deep learning have the advantage that they can easily leverage pre-trained word vectors as features (Constant et al, 2017;Taslimipoor and Rohanian, 2018;Ehren et al, 2018). The method described in this work also relies on pretrained word vectors.…”
Section: Related Workmentioning
confidence: 99%
“…One of the main tasks that constitute MWE processing is the automatic identification of MWEs in running text which can be used as a preprocessing step for parsing or machine translation. MWE identification can be seen as a sequence labeling task similar to named entity recognition (NER): A system receives sequences of tokens as input and outputs the same sequences with annotation labels added to it (Constant et al, 2017). As in NER, most parts of the sequence will belong to the negative class, that is, the majority of words is not part of an MWE.…”
Section: Introductionmentioning
confidence: 99%