2015
DOI: 10.1002/asi.23290
|View full text |Cite
|
Sign up to set email alerts
|

An automatic approach to weighted subject indexing—an empirical study in the biomedical domain

Abstract: Subject indexing is an intellectually intensive process that has many inherent uncertaint ies.Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0
1

Year Published

2017
2017
2017
2017

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 45 publications
0
3
0
1
Order By: Relevance
“…Evaluating the performance of different weighting methods is not a trivial task. Our earlier studies (Lu & Mao, ; Lu et al, ) assessed the performance of different methods according to their ability to rank the major MeSH at the top. This evaluation method is cost‐effective.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Evaluating the performance of different weighting methods is not a trivial task. Our earlier studies (Lu & Mao, ; Lu et al, ) assessed the performance of different methods according to their ability to rank the major MeSH at the top. This evaluation method is cost‐effective.…”
Section: Discussionmentioning
confidence: 99%
“…The weighted mutual information between a subject descriptor h and an item i in document d is applied as in Lu and Mao (): normalMnormalI|normalh;normali=δ|i,hp|i,hlogptrue(i,htrue)ptrue(itrue)ptrue(htrue) where δ(i, h) is the weight of the pair <i, h>, which is obtained by: δ|normali,normalh=|tfi+0.5logN+0.5dfi+0.5logN+0.5dfh+0.5 where tf i is the frequency of the item i in the document, N is the total number of documents in the corpus, df i & df h are the document frequencies (i.e., number of documents) of item i and subject descriptor h , respectively. The probabilities p(i,h) , p(i) , and p(h) are estimated by Maximum Likelihood Estimator (MLE) at the document level: normalp|ι=dfιN …”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Auktorisoimattomia luonnollisen kielen kuvailusanoja kutsutaan yleensä avainsanoiksi (keywords), ja niiden käyttö perustuu olettamukseen, että kirjoittaja ja käyttäjä käyttävät samaa käsitettä kuvaamaan samaa ilmiötä, eli tekstin ydinsanomaa (Taylor, 2004). Vaikka avainsanojen käyttö on yleistynyt, tarvitaan monien mielestä auktorisoituja asiasanoja luonnollisen kielen aiheuttamien ongelmien, kuten synonyymien, vuoksi (Gross, Taylor, & Joudrey, 2015;Lappalainen, Nykyri, & Palonen, 2013;Lu & Mao, 2015).…”
Section: Asiasanat Ja Tiedonhakuunclassified