Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by “MWE processing,” distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives.
This study examines memory retrieval and syntactic composition using fMRI while participants listen to a book, The Little Prince. These two processes are quantified drawing on methods from computational linguistics. Memory retrieval is quantified via multi-word expressions that are likely to be stored as a unit, rather than built-up compositionally. Syntactic composition is quantified via bottom-up parsing that tracks tree-building work needed in composed syntactic phrases. Regression analyses localise these to spatially-distinct brain regions. Composition mainly correlates with bilateral activity in anterior temporal lobe and inferior frontal gyrus. Retrieval of stored expressions drives right-lateralised activation in the precuneus. Less cohesive expressions activate well-known nodes of the language network implicated in composition. These results help to detail the neuroanatomical bases of two widely-assumed cognitive operations in language processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.