1991
DOI: 10.1145/125187.125189
|View full text |Cite
|
Sign up to set email alerts
|

A probabilistic learning approach for document indexing

Abstract: We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation.(2) Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of document structures or differe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
99
0
1

Year Published

1995
1995
2011
2011

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 146 publications
(102 citation statements)
references
References 16 publications
2
99
0
1
Order By: Relevance
“…A wide variety of learning approaches have been applied to TC, to name a few, Bayesian classification (Lewis and Ringuette 1994;Domingo and Pazzani 1996;Larkey and Croft 1996;Koller and Sahami 1997;Lewis 1998), decision trees (Weiss, Apte et al ;Fuhr and Buckley 1991;Cohen and Hirsh 1998;Li and Jain 1998), decision rule classifiers such as CHARADE (Moulinier and Ganascia 1996), or DL-ESC (Li and Yamanishi 1999), or RIPPER (Cohen and Hirsh 1998), or SCAR (Moulinier, Raskinis et al 1996), or SCAP-1 (Apté, Damerau et al 1994), multi-linear regression models (Yang and Chute 1994;Yang and Liu 1999), Rocchio method (Hull 1994;Ittner, Lewis et al 1995;Sable and Hatzivassiloglou 2000), Neural Networks (Schütze, Hull et al 1995;Wiener, Pedersen et al 1995;Dagan, Karov et al 1997;Ng, Goh et al 1997;Lam and Lee 1999;Ruiz and Srinivasan 1999), example based classifiers (Creecy 1991;Masand, Linoff et al 1992;Larkey 1999), support vector machines (Joachims 1998), Bayesian inference networks (Tzeras and Hartmann 1993;Wai and Fan 1997;Dumais, Platt et al 1998), genetic algorithms (Masand 1994;Clack, Farringdon et al 1997), and maximum entropy modelling (Manning and Schütze 1999).…”
Section: Machine Learning Approaches To Text Categorizationmentioning
confidence: 99%
See 2 more Smart Citations
“…A wide variety of learning approaches have been applied to TC, to name a few, Bayesian classification (Lewis and Ringuette 1994;Domingo and Pazzani 1996;Larkey and Croft 1996;Koller and Sahami 1997;Lewis 1998), decision trees (Weiss, Apte et al ;Fuhr and Buckley 1991;Cohen and Hirsh 1998;Li and Jain 1998), decision rule classifiers such as CHARADE (Moulinier and Ganascia 1996), or DL-ESC (Li and Yamanishi 1999), or RIPPER (Cohen and Hirsh 1998), or SCAR (Moulinier, Raskinis et al 1996), or SCAP-1 (Apté, Damerau et al 1994), multi-linear regression models (Yang and Chute 1994;Yang and Liu 1999), Rocchio method (Hull 1994;Ittner, Lewis et al 1995;Sable and Hatzivassiloglou 2000), Neural Networks (Schütze, Hull et al 1995;Wiener, Pedersen et al 1995;Dagan, Karov et al 1997;Ng, Goh et al 1997;Lam and Lee 1999;Ruiz and Srinivasan 1999), example based classifiers (Creecy 1991;Masand, Linoff et al 1992;Larkey 1999), support vector machines (Joachims 1998), Bayesian inference networks (Tzeras and Hartmann 1993;Wai and Fan 1997;Dumais, Platt et al 1998), genetic algorithms (Masand 1994;Clack, Farringdon et al 1997), and maximum entropy modelling (Manning and Schütze 1999).…”
Section: Machine Learning Approaches To Text Categorizationmentioning
confidence: 99%
“…combining many words as one index, for example "artificial intelligence" or "data mining" (Fuhr and Buckley 1991;Tzeras and Hartmann 1993;Schütze, Hull et al 1995). These indexes can be generated either manually or automatically.…”
Section: Indexingmentioning
confidence: 99%
See 1 more Smart Citation
“…• DIAAF: The Darmstadt Indexing Approach (DIA) [11] was originally "developed for automatic indexing with a prescribed indexing vocabulary" [12]. In a machine learning context, Sebastiani [23] argues that this approach "considers properties (of terms, documents, categories, or pairwise relationships among these) as basic dimensions of the learning space".…”
Section: Statistical Feature Selectionmentioning
confidence: 99%
“…Document indexing is defined as the task of assigning terms to documents for retrieval purposes [11]. The process consists of two generic steps: extracting the subject matter of a document, and expressing the subject matter in index terms to facilitate subject retrieval [12].…”
Section: Introductionmentioning
confidence: 99%