2002
DOI: 10.3758/bf03195456
|View full text |Cite
|
Sign up to set email alerts
|

Using Internet search engines to estimate word frequency

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

3
44
0

Year Published

2005
2005
2013
2013

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 70 publications
(47 citation statements)
references
References 13 publications
3
44
0
Order By: Relevance
“…Four of the selected taboo words are not found in the traditional word norms. However, Blair, Urland, and Ma (2002) have shown that Internet search engines produce results comparable to those of Ku era and Francis's (1967) norms. Furthermore, they reported that among the search engines, Alta Vista was most highly correlated with the Ku era and Francis norms.…”
Section: Methodsmentioning
confidence: 99%
“…Four of the selected taboo words are not found in the traditional word norms. However, Blair, Urland, and Ma (2002) have shown that Internet search engines produce results comparable to those of Ku era and Francis's (1967) norms. Furthermore, they reported that among the search engines, Alta Vista was most highly correlated with the Ku era and Francis norms.…”
Section: Methodsmentioning
confidence: 99%
“…The search engine returns the number of hits for the word, summing over its various case-inflected forms, unlike search engines such as Google, which did so for the most common Russian words but not for less frequent ones (at least this was the case when the experiment was run and the frequencies were obtained). The technique of using search engines for frequency estimates is discussed by Blair et al (2002). There are a few Zasorina (1977), Sharoff (2005), and the Russian National Corpus (RNC), but they do not include a wide range of compound words.…”
Section: Methodsmentioning
confidence: 99%
“…One hypothesis is that, although the (unX )able meaning is generally the dominant meaning, the unX word may sometimes be fairly infrequent and thus neither meaning may be dominant for those words. Frequency counts from standard databases seemed quite unreliable (e.g., unwrap was not a word in a major corpus), so instead we used the number of Google hits as the frequency measure (Blair, Urland, & Ma, 2002); the median value for the unX words was 842,000, and they ranged from 34,800 to 33,500,000 (see Table 5). The log of this frequency explained a sizeable amount of the variability between items in the gopast measures.…”
Section: Methodsmentioning
confidence: 99%