1993
DOI: 10.21236/ada273556
|View full text |Cite
|
Sign up to set email alerts
|

Building a Large Annotated Corpus of English: The Penn Treebank

Abstract: 13Cont,act The Ling11ist.i~ Da.ta Consortil~m. 441 l17illianls Ilall. I~'nivc.r>it!-ol'Prtlr~syl~.ania. I'hilatlelpllia, PA 19104-6305 or send email to ltlc~Ci~~~~ra.gi.ci~.r~l~enn.ed~~ for ulore i~~f o r r~~a t i o n .

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

13
3,347
1
45

Year Published

2003
2003
2020
2020

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 3,855 publications
(3,406 citation statements)
references
References 10 publications
13
3,347
1
45
Order By: Relevance
“…The parser used was Roark's (2001) incremental top-down parser. This is a probabilistic parser trained on the Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1993), a corpus of English text manually annotated with phrase structure trees. Only the Wall Street Journal section of the Penn Treebank was used for training.…”
Section: Methodsmentioning
confidence: 99%
“…The parser used was Roark's (2001) incremental top-down parser. This is a probabilistic parser trained on the Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1993), a corpus of English text manually annotated with phrase structure trees. Only the Wall Street Journal section of the Penn Treebank was used for training.…”
Section: Methodsmentioning
confidence: 99%
“…Log-transformed lexical frequency counts were generated from the SUBTLEXus corpus (Brysbaert and New, 2009). Each sentence was parsed using Roark's (Roark, 2001;Roark et al, 2009) incremental top-down PCFG parser trained on the Wall Street Journal corpus of the Penn Treebank (Marcus et al, 1993) to generate syntactic surprisal values for each word.…”
Section: Language Statisticsmentioning
confidence: 99%
“…These 1127 tokens were then combined with 1263 tokens from the Switchboard Corpus (Godfrey et al 1992), and 404 written tokens from the Treebank Wall-Street Journal (Marcus et al 1993). …”
Section: The Data From Nz and Us Englishmentioning
confidence: 99%