1995
DOI: 10.1080/09296179508590051
|View full text |Cite
|
Sign up to set email alerts
|

Good‐turing frequency estimation without tears*

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
159
0

Year Published

2005
2005
2013
2013

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 244 publications
(167 citation statements)
references
References 16 publications
0
159
0
Order By: Relevance
“…From the full version of Table 1 we have N = rNr = 1320515 and N1 = 103978. Thus the Turing-Good estimate [8] of the amount of the probability mass missing is N 1 /N ≈ 0.079 or 7.9%. This tells us that our estimate of the distribution of login frequencies is reasonably accurate, in that the bulk of the mass has been captured.…”
Section: How Many Login Urls Are There?mentioning
confidence: 99%
See 1 more Smart Citation
“…From the full version of Table 1 we have N = rNr = 1320515 and N1 = 103978. Thus the Turing-Good estimate [8] of the amount of the probability mass missing is N 1 /N ≈ 0.079 or 7.9%. This tells us that our estimate of the distribution of login frequencies is reasonably accurate, in that the bulk of the mass has been captured.…”
Section: How Many Login Urls Are There?mentioning
confidence: 99%
“…The standard means of estimating the probability mass of unseen species in a limited observation is the Good-Turing estimate [8]. From the full version of Table 1 we have N = rNr = 1320515 and N1 = 103978.…”
Section: How Many Login Urls Are There?mentioning
confidence: 99%
“…It also allows for missing bins and for the fact that the observed numbers are noisy estimates (i.e., subject to measurement error). These calculations are considerably more complicated (see Gale & Sampson, 1995, for an introduction), but can be circumvented by making use of existing software packages.…”
Section: The Good-turing Algorithmmentioning
confidence: 99%
“…We decided to use the frequencies from the subtitle corpus, because we think it gives a more accurate image of everyday language, which is the language FFL teaching is mainly concerned with. The frequencies were changed into probabilities, and smoothed with the Simple Good-Turing algorithm described by Gale and Sampson (1995). This step is necessary to solve another well-known problem in language models: the appearance in a new text of previously unseen lemmas.…”
Section: The Language Model: Probabilities and Smoothingmentioning
confidence: 99%