An automatic approach to weighted subject indexing—an empirical study in the biomedical domain

Lü, Kun; Mao, Jin

doi:10.1002/asi.23290

Cited by 2 publications

(4 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Evaluating the performance of different weighting methods is not a trivial task. Our earlier studies (Lu & Mao, ; Lu et al, ) assessed the performance of different methods according to their ability to rank the major MeSH at the top. This evaluation method is cost‐effective.…”

Section: Discussionmentioning

confidence: 99%

“…The weighted mutual information between a subject descriptor h and an item i in document d is applied as in Lu and Mao ():

normalM normalI (|, normalh; normali) = δ (|, i, h) p (|, i, h) l o g \frac{p true(i, h true)}{p true(i true) p true(h true)}

where δ(i, h) is the weight of the pair <i, h>, which is obtained by:

δ (|, normali, normalh) = (|, {t f}_{i} + 0.5) * l o g \frac{N + 0.5}{{d f}_{i} + 0.5} * l o g \frac{N + 0.5}{{d f}_{h} + 0.5}

where tf i is the frequency of the item i in the document, N is the total number of documents in the corpus, df i & df h are the document frequencies (i.e., number of documents) of item i and subject descriptor h , respectively. The probabilities p(i,h) , p(i) , and p(h) are estimated by Maximum Likelihood Estimator (MLE) at the document level:

normalp (|, ι) = \frac{{d f}_{ι}}{N}

…”

Section: Methodsmentioning

confidence: 99%

“…A more recent study by Zhang, Smith, Twidale, and Gao () revisited the need for a weighting mechanism for subject indexing. In Lu and Mao (), an automated approach that provides a concrete and cost‐effective implementation for weighted subject indexing has been proposed. Their initial experiments on a medical collection suggested the feasibility of the method.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Toward effective automated weighted subject indexing: A comparison of different approaches in different environments

Lü

Mao

2017

Asso for Info Science & Tech

Self Cite

View full text Add to dashboard Cite

Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao

show abstract

Section: Discussionmentioning

confidence: 99%

“…The weighted mutual information between a subject descriptor h and an item i in document d is applied as in Lu and Mao ():

normalM normalI (|, normalh; normali) = δ (|, i, h) p (|, i, h) l o g \frac{p true(i, h true)}{p true(i true) p true(h true)}

where δ(i, h) is the weight of the pair <i, h>, which is obtained by:

δ (|, normali, normalh) = (|, {t f}_{i} + 0.5) * l o g \frac{N + 0.5}{{d f}_{i} + 0.5} * l o g \frac{N + 0.5}{{d f}_{h} + 0.5}

normalp (|, ι) = \frac{{d f}_{ι}}{N}

…”

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward effective automated weighted subject indexing: A comparison of different approaches in different environments

Lü

Mao

2017

Asso for Info Science & Tech

Self Cite

View full text Add to dashboard Cite

show abstract

“…Auktorisoimattomia luonnollisen kielen kuvailusanoja kutsutaan yleensä avainsanoiksi (keywords), ja niiden käyttö perustuu olettamukseen, että kirjoittaja ja käyttäjä käyttävät samaa käsitettä kuvaamaan samaa ilmiötä, eli tekstin ydinsanomaa (Taylor, 2004). Vaikka avainsanojen käyttö on yleistynyt, tarvitaan monien mielestä auktorisoituja asiasanoja luonnollisen kielen aiheuttamien ongelmien, kuten synonyymien, vuoksi (Gross, Taylor, & Joudrey, 2015;Lappalainen, Nykyri, & Palonen, 2013;Lu & Mao, 2015).…”

Section: Asiasanat Ja Tiedonhakuunclassified

Asiasana hallussa? – Opiskelijoiden tiedonhakukäyttäytyminen ja asiasanat osana sitä

Ruokolainen¹

2017

INF

View full text Add to dashboard Cite

The study focuses on the information seeking behaviour of graduate students and the role of subject headings in it. The aim of the study was to see if students are aware of indexing. The motivation lies in the aim to simplicity of modern student information seeking behaviour, the need to use complex databases and to find relevant information, where indexing plays a crucial role. The study was conducted with six qualitative interviews. The findings indicate that student information seeking behaviour is simple and intuitive and subject headings have only a small role in it. Students lack understanding of information seeking terminology and profound awareness of organisation of information, which could help them in information seeking.Asiasanat: informaatiokäyttäytyminen; tiedonhaku; asiasanat; indeksointi; opiskelijat Opiskelijoiden tiedonhakukäyttäytyminen on nopeuteen ja helppouteen pyrkivää. Heidän täytyy kuitenkin opinnoissaan hakea tietoa usein Googlea monimutkaisemmista tietokannoista, mikä vaatii tiedonhakutaitoja. Kuvailutermien, kuten asia-ja avainsanojen, käytöllä sisällönkuvailussa pyritään siihen, että aineistot ovat tiedonhakijan löydettävissä. Yllättäen indeksoinnin ja opiskelijoiden tiedonhakukäyttäy-tymisen välistä suhdetta ei kuitenkaan juurikaan ole tutkittu. Tässä artikkelissa tutkitaankin, onko asiasanoilla jonkinlainen rooli opiskelijoiden luonnollisessa tiedonhakukäyttäytymisessä.Artikkeli on lisensoitu Creative Commons Nimeä-EiKaupallinen-JaaSamoin 4.0 Kansainvälinen -lisenssillä Pysyvä osoite: https://doi.org/10.23978/inf.63188

show abstract

An automatic approach to weighted subject indexing—an empirical study in the biomedical domain

Cited by 2 publications

References 45 publications

Toward effective automated weighted subject indexing: A comparison of different approaches in different environments

Toward effective automated weighted subject indexing: A comparison of different approaches in different environments

Asiasana hallussa? – Opiskelijoiden tiedonhakukäyttäytyminen ja asiasanat osana sitä

Contact Info

Product

Resources

About