2004
DOI: 10.3758/bf03195586
|View full text |Cite
|
Sign up to set email alerts
|

Case-sensitive letter and bigram frequency counts from large-scale English corpora

Abstract: We tabulated upper-and lowercase letter frequency using several large-scale English corpora (~183 million words in total). The results indicate that the relative frequencies for upper-and lowercase letters are not equivalent. We report a letter-naming experiment in which uppercase frequency predicted response time to uppercase letters better than did lowercase frequency. Tables of case-sensitive letter and bigram frequency are provided, including common nonalphabetic characters. Because subjects are sensitive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
65
0
1

Year Published

2004
2004
2021
2021

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 98 publications
(67 citation statements)
references
References 33 publications
(40 reference statements)
1
65
0
1
Order By: Relevance
“…The impact of a writing system's grapho-phonological characteristics on reading and writing performance led several investigators to make use of published wordfrequency counts to derive new sets of computations geared to specialized research (e.g., Berndt, D'Autrechy, & Reggia, 1994;De Cara & Goswami, 2002;Gahl, Jurafsky, & Roland, 2004;Jones & Mewhort, 2004;Kessler & Treiman, 1997;Novick & Sherman, 2004;Peereman & Content, 1999;Stanback, 1992;Tamaoka & Makioka, 2004). Unfortunately, there are very few quantitative descriptions of the orthographic and phonological properties of French words that are suitable for studying literacy acquisition.…”
Section: Lexical Databases For the Study Of Literacy Acquisitionmentioning
confidence: 99%
“…The impact of a writing system's grapho-phonological characteristics on reading and writing performance led several investigators to make use of published wordfrequency counts to derive new sets of computations geared to specialized research (e.g., Berndt, D'Autrechy, & Reggia, 1994;De Cara & Goswami, 2002;Gahl, Jurafsky, & Roland, 2004;Jones & Mewhort, 2004;Kessler & Treiman, 1997;Novick & Sherman, 2004;Peereman & Content, 1999;Stanback, 1992;Tamaoka & Makioka, 2004). Unfortunately, there are very few quantitative descriptions of the orthographic and phonological properties of French words that are suitable for studying literacy acquisition.…”
Section: Lexical Databases For the Study Of Literacy Acquisitionmentioning
confidence: 99%
“…La sesión t 1 consiste en la medición del conocimiento de 24 de las 30 grafías del alfabeto español, correspondientes a aquellas grafías simples que están asociadas a un único grafema (excepto «h», «k» y «w»). Se opta por evaluar el reconocimiento de grafías minúsculas, ya que estas son más frecuentes en los textos (Jones y Mewhort, 2004). El uso de BIL y de PROLEC-R permite el aumento de los ítems evaluados en alusión a las grafías simples vocálicas y consonánticas del código español.…”
Section: Procedimientounclassified
“…(21) and bðc i ; c j Þ is a frequency count of joint occurrences of character pairs ðc i ; c j Þ in a logarithmic scale. For the estimation of bðÁ; ÁÞ, the case-sensitive bigram counts from the NYT corpus [42] is used. This function gives a high score to character pairs that frequently appear in the lexicon, while character pairs that never occur cannot get scored.…”
Section: Text Line Verificationmentioning
confidence: 99%