2018
DOI: 10.1007/s00799-018-0240-3
|View full text |Cite
|
Sign up to set email alerts
|

Fusion architectures for automatic subject indexing under concept drift

Abstract: Indexing documents with controlled vocabularies enables a wealth of semantic applications for digital libraries. Due to the rapid growth of scientific publications, machine learning based methods are required that assign subject descriptors automatically. While stability of generative processes behind the underlying data is often assumed tacitly, it is being violated in practice. Addressing this problem, this article studies explicit and implicit concept drift, that is, settings with new descriptor terms and n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 29 publications
(105 reference statements)
0
8
0
Order By: Relevance
“…In both cases, the result is a list of candidate subjects for the document. In order to determine the final set of suggested subjects for the document, the candidates must then be ranked and only the most promising ones retained (Medelyan, 2009;Toepfer & Seifert, 2018).…”
Section: Process Of Automated Indexingmentioning
confidence: 99%
See 3 more Smart Citations
“…In both cases, the result is a list of candidate subjects for the document. In order to determine the final set of suggested subjects for the document, the candidates must then be ranked and only the most promising ones retained (Medelyan, 2009;Toepfer & Seifert, 2018).…”
Section: Process Of Automated Indexingmentioning
confidence: 99%
“…Algorithms for automated subject indexing can generally be divided into lexical and associative approaches (Toepfer & Seifert, 2018). In lexical approaches, frequently occurring or otherwise salient terms in the document are matched with terms in the vocabulary.…”
Section: Approachesmentioning
confidence: 99%
See 2 more Smart Citations
“…Figures 6.10c and 6.11c). This resembles a challenge because label annotations suffer from concept drift over time [TS20]. We use the years 2012 and 2013 as test documents for EconBiz and the year 2016 for IREON to obtain a 90:10 train-test ratio, as in the citation recommendation datasets described above.…”
Section: Chronological Train-test Splitsmentioning
confidence: 99%