Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 1 - EMNLP '09 2009
DOI: 10.3115/1699510.1699553
|View full text |Cite
|
Sign up to set email alerts
|

Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing

Abstract: A number of recent publications have made use of the incremental output of stochastic parsers to derive measures of high utility for psycholinguistic modeling, following the work of Hale (2001;. In this paper, we present novel methods for calculating separate lexical and syntactic surprisal measures from a single incremental parser using a lexicalized PCFG. We also present an approximation to entropy measures that would otherwise be intractable to calculate for a grammar of that size. Empirical results demonst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
183
1

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 125 publications
(184 citation statements)
references
References 34 publications
(55 reference statements)
0
183
1
Order By: Relevance
“…In contrast, it is less clear exactly how reading times relate to the lexicalized syntactic surprisal metric we used here. Roark et al (2009) reported a significant effect of this type of surprisal on selfpaced reading times, but only for content words once function words had been removed from their analysis. Importantly, however, Roark et al used self-paced reading, which is slower and tends to index later and more strategic language processes compared to eyetracking.…”
Section: Discussionmentioning
confidence: 95%
See 3 more Smart Citations
“…In contrast, it is less clear exactly how reading times relate to the lexicalized syntactic surprisal metric we used here. Roark et al (2009) reported a significant effect of this type of surprisal on selfpaced reading times, but only for content words once function words had been removed from their analysis. Importantly, however, Roark et al used self-paced reading, which is slower and tends to index later and more strategic language processes compared to eyetracking.…”
Section: Discussionmentioning
confidence: 95%
“…Log-transformed lexical frequency counts were generated from the SUBTLEXus corpus (Brysbaert and New, 2009). Each sentence was parsed using Roark's (Roark, 2001;Roark et al, 2009) incremental top-down PCFG parser trained on the Wall Street Journal corpus of the Penn Treebank (Marcus et al, 1993) to generate syntactic surprisal values for each word.…”
Section: Language Statisticsmentioning
confidence: 99%
See 2 more Smart Citations
“…Syntactic surprisal and lexical surprisal are calculated to account for high surprisal scores (Roark et al, 2009). As Roark et al (2009) mentions, a word may surprise because it is unconventional, or because it occurs in an unusual context.…”
Section: Methodsmentioning
confidence: 99%