1995
DOI: 10.1017/s1351324900000139
|View full text |Cite
|
Sign up to set email alerts
|

Poisson mixtures

Abstract: Shannon (1948) showed that a wide range of practical problems can be reduced to the problem of estimating probability distributions of words and ngrams in text. It has become standard practice in text compression, speech recognition, information retrieval and many other applications of Shannon's theory to introduce a “bag-of-words” assumption. But obviously, word rates vary from genre to genre, author to author, topic to topic, document to document, section to section, and paragraph to paragraph. The proposed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
157
0

Year Published

2000
2000
2013
2013

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 223 publications
(159 citation statements)
references
References 11 publications
2
157
0
Order By: Relevance
“…4 is one suggestion for estimating tag salience, which we evaluate experimentally in Section 3. Further alternative estimations are possible, for instance by relatively straight-forward extensions to IDF, such as RIDF [2], or by more elaborate approximations of tag topicality, such as Zhou et al's approach [6] that uses Bayesian Inference.…”
Section: Query Expansion With Social Tags As Logical Inferencementioning
confidence: 99%
“…4 is one suggestion for estimating tag salience, which we evaluate experimentally in Section 3. Further alternative estimations are possible, for instance by relatively straight-forward extensions to IDF, such as RIDF [2], or by more elaborate approximations of tag topicality, such as Zhou et al's approach [6] that uses Bayesian Inference.…”
Section: Query Expansion With Social Tags As Logical Inferencementioning
confidence: 99%
“…A statistics which is frequently used to model the distribution of words in texts is Poisson or, alternatively, a mixture of Poissons (cf. [2]). While the distribution of function words is close to the expected distribution under these models, good keyword candidates deviate significantly from it.…”
Section: Key Word Detection and Extractionmentioning
confidence: 99%
“…Documents which are assigned keywords of a wide variety might also be searched by such a wide variety of search terms. 4 Ontological support of searching Current eLearning systems oer only full-text or keyword-based search facilities. We will outline in this section the steps we took in order to implement an ontology search facility and also describe planned extensions for crosslingual search.…”
Section: Discussionmentioning
confidence: 99%