2020
DOI: 10.2478/jdis-2020-0003
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Classification of Swedish Metadata Using Dewey Decimal Classification: A Comparison of Approaches

Abstract: PurposeWith more and more digital collections of various information resources becoming available, also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems. While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification (DDC) classes for Swedish digital collections, the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteris… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 17 publications
2
8
0
Order By: Relevance
“…While multi-label classification is a well-studied subject, in this paper we perform this task on a noisy data set in an expert domain, making the process more challenging. Even though the difficulty of the task is high, we achieve decent results: we achieve comparable or better scores when compared to similar studies in other domains (Golub et al, 2020;Kleppe et al, 2019). We also specifically test which pre-processing methods have a positive effect on classification, and provide the created data in an online repository 2 .…”
Section: Introductionsupporting
confidence: 58%
See 1 more Smart Citation
“…While multi-label classification is a well-studied subject, in this paper we perform this task on a noisy data set in an expert domain, making the process more challenging. Even though the difficulty of the task is high, we achieve decent results: we achieve comparable or better scores when compared to similar studies in other domains (Golub et al, 2020;Kleppe et al, 2019). We also specifically test which pre-processing methods have a positive effect on classification, and provide the created data in an online repository 2 .…”
Section: Introductionsupporting
confidence: 58%
“…These characteristics are not unique to the archaeology domain, and are also often encountered in e.g. the biomedical domain (Laza et al, 2011) and library domain (Golub et al, 2020).…”
Section: Multi-label Text Classificationmentioning
confidence: 99%
“…The evaluation is therefore performed on the unit of subject headings instead of documents, which further differentiates this work with a standard document retrieval task. Our idea is supported by Golub et al [19] who concluded that automatic subject heading assignment should never be implemented on its own; instead, a system should combine the efficiency of automatic suggestions with quality of human decisions at the final stage. They found that applying purely automatic subject heading classification does not work, because there are a large number of subject headings classes.…”
Section: Introductionmentioning
confidence: 89%
“…later, sapon-white and hansbrough [9] showed that the dissertations with subject headings are found to be more likely to circulate than those without subject headings. Some researchers have also studied subject heading for various purposes, such as: analysing subjects listed in different subject heading [10]- [12], generating map of science [13], [14], developing simplified subject heading list [15], [16], and assigning subject heading automatically [17]- [19].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation