2009
DOI: 10.1007/s10579-009-9095-y
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing and identifying multiword expressions in spoken language

Abstract: The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 31 publications
(36 reference statements)
0
2
0
Order By: Relevance
“…Cappelle, Shtyrov, & Pulvermüller, 2010). The proposal that fast-processing leads to changes in pronunciation (for a review, see Lin, 2010) has also led to explorations of ways to extract formulaic sequences automatically from spoken corpora (e.g., Strik, Hulsbosch, & Cucchiarini, 2010).…”
Section: How Are Formulaic Sequences Processed?mentioning
confidence: 99%
“…Cappelle, Shtyrov, & Pulvermüller, 2010). The proposal that fast-processing leads to changes in pronunciation (for a review, see Lin, 2010) has also led to explorations of ways to extract formulaic sequences automatically from spoken corpora (e.g., Strik, Hulsbosch, & Cucchiarini, 2010).…”
Section: How Are Formulaic Sequences Processed?mentioning
confidence: 99%
“…A popular type-independent alternative to MWE identification is to use statistical AMs (Evert and Krenn, 2005;Zhang et al, 2006;Villavicencio et al, 2007). Concerned MWE identification and extraction from monolingual corpora, proposed a method for automatically identifying English verb particle constructions (VPCs), Pecina (2009) reported an evaluation of a set of lexical association measures based on the Prague Dependency Treebank and the Czech National Corpus, Strik et al (2010) investigated the possible ways of automatically identifying Dutch MWEs in speech corpora. Related to lexical representation of MWEs in a lexicon and a syntactic treebank, Gregoire (2010) discusses the design and implementation of a Dutch Electronic Lexicon of Multiword Expressions (DuELME), which contains over 5,000 Dutch multiword expressions.…”
Section: Related Workmentioning
confidence: 99%