2015
DOI: 10.3758/s13428-015-0614-z
|View full text |Cite
|
Sign up to set email alerts
|

Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models

Abstract: Corpus-based semantic space models, which primarily rely on lexical co-occurrence statistics, have proven effective in modeling and predicting human behavior in a number of experimental paradigms that explore semantic memory representation. The most widely studied extant models, however, are strongly influenced by orthographic word frequency (e.g., Shaoul & Westbury, Behavior Research Methods, 38, 190-195, 2006). This has the implication that high-frequency closed-class words can potentially bias co-occurrenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 59 publications
(77 reference statements)
0
3
0
Order By: Relevance
“…However, even if founded on similar distributional underpinnings, OSC provides a different semantic characterization. On the one hand, in the present article the measure we computed is based on word embeddings, an approach that has proven to outperform traditional distributional models in a number of different tasks (Baroni et al, 2014b;Mandera et al, 2017), and produces more nuanced and cognitively plausible representations (Keith, Westbury, & Goldman, 2015;Mandera et al, 2017). On the other hand, OSC captures semantic information that is tightly entangled with the word orthography and has an effect on lexical access that is independent from the one associated with the sheer semantic neighborhood (Amenta et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…However, even if founded on similar distributional underpinnings, OSC provides a different semantic characterization. On the one hand, in the present article the measure we computed is based on word embeddings, an approach that has proven to outperform traditional distributional models in a number of different tasks (Baroni et al, 2014b;Mandera et al, 2017), and produces more nuanced and cognitively plausible representations (Keith, Westbury, & Goldman, 2015;Mandera et al, 2017). On the other hand, OSC captures semantic information that is tightly entangled with the word orthography and has an effect on lexical access that is independent from the one associated with the sheer semantic neighborhood (Amenta et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…The compositional approach to derivation and compounding is very popular in distributional semantics and has been studied with some success for a number of languages and morphological processes (see Lazaridou et al, 2013;Marelli & Baroni, 2015 for English, Padó et al, 2016;Cotterell & Schütze, 2018 for German, and Melymuka et al, 2017 for Ukrainian, Günther & Marelli, 2018 on compounding). While it is less common in more traditional, count DSMs (but see Keith et al (2015) and the other works cited above), this approach has found plenty of application in neural DSMs (Luong et al, 2013;Cotterell & Schütze, 2018), and it also characterizes the discriminative learning approach by Baayen et al (2019) (who, however, do not focus on subword units such as -s, but on semantic units such as PLURAL). The compositional approach does, however, build on the straightforward assumption that derivational shifts are fully learnable -in the sense that they are systematic and predictable based on the meaning of the base word.…”
Section: Investigating Morphology With Distributional Semanticsmentioning
confidence: 99%
“…Merging these different inflectional terms into a single term is called stemming or lemmatization [47]. Stemming implements a heuristic-based technique [48] to remove characters at the end, while lemmatization practices employ principled approaches to reduce inflectional forms to a common base form [49] by recursive processing in different layers. Miller [50] extracted words from the WordNet dictionary, but this was limited to the convenience of human readers and a combination of traditional computing with lexicographic information.…”
Section: Lemmatizationmentioning
confidence: 99%