1968
DOI: 10.1002/asi.5090190409
|View full text |Cite
|
Sign up to set email alerts
|

Statistical generation of a technical vocabulary

Abstract: The results of an experiment in the use of statistical techniques for extracting a technical vocabulary from document texts are presented and discussed.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0
1

Year Published

1974
1974
2009
2009

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(14 citation statements)
references
References 1 publication
0
13
0
1
Order By: Relevance
“…In computing the combined scores given in equations (16) and (17), we have employed an optimization procedure to choose the coefficients. In such a computation there is always the possibility of overtraining, so that the results are applicable only to the particular data set on which we have done the training.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In computing the combined scores given in equations (16) and (17), we have employed an optimization procedure to choose the coefficients. In such a computation there is always the possibility of overtraining, so that the results are applicable only to the particular data set on which we have done the training.…”
Section: Resultsmentioning
confidence: 99%
“…It has been proposed that non-content-bearing terms are well modeled by a single Poisson distribution, whereas content-bearing terms require a two-Poisson or some more complicated model. [15][16][17] In fact, one of our scoring methods is based on the degree that a term's distribution deviates from a Poisson distribution. The greater this deviation, the more likely that the term is a useful one.…”
mentioning
confidence: 99%
“…(Bookstein & Swanson, 1974;1975), (Harter, 1975), (Stone & Rubinoff, 1968), (Dennis, 1967), (Damerau, 1965). Det kan ikke desto mindre vaere ganske illustrativt at underbygge det intuitivt plausible i det anførte fagsprogssyn ved en simpel logisk test, hvor det iagttages, hvorvidt et bestemt ord enten optraeder eller ikke optraeder i en tekstsamlings forskellige delkorpora.…”
Section: Fagsprogsopfattelseunclassified
“…It is possible to think of such features as having been produced by a conceptual feature generator that produces features at a fixed average rate with the interval between feature occurrences a random exponential variable. The Poisson model has been suggested as providing a satisfactory description of the occurrence frequencies of natural language terms [12,23,36]. Use of this distribution may be helpful if advantage is taken of content analytic methods for counting the number of times particular positive or negative terms or expressions occur in a message [6,18].…”
Section: Poisson Independently Distributed Featuresmentioning
confidence: 99%