2012
DOI: 10.1007/978-3-642-33290-6_35
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Use of Clustering for Automatically Organising Digital Library Collections

Abstract: Abstract. Large digital libraries have become available over the past years through digitisation and aggregation projects. These large collections present a challenge to the new user who wishes to discover what is available in the collections. Subject classification can help in this task, however in large collections it is frequently incomplete or inconsistent. Automatic clustering algorithms provide a solution to this, however the question remains whether they produce clusters that are sufficiently cohesive a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 24 publications
(25 reference statements)
0
7
0
Order By: Relevance
“…Since its inception, the method of Chang et al (2009) has been used variously as a means of assessing topic models (Paul and Girju, 2010;Reisinger et al, 2010;Hall et al, 2012). Despite its wide acceptance, the method relies on manual annotation and has never been automated.…”
Section: Introductionmentioning
confidence: 99%
“…Since its inception, the method of Chang et al (2009) has been used variously as a means of assessing topic models (Paul and Girju, 2010;Reisinger et al, 2010;Hall et al, 2012). Despite its wide acceptance, the method relies on manual annotation and has never been automated.…”
Section: Introductionmentioning
confidence: 99%
“…As the acceptance of topic coherence measures increases as a mean of topic model assessment (Paul and Girju, 2010;Reisinger et al, 2010;Hall et al, 2012), recent research trends focus on proposing fast and efficient models that can be scaled up to big amounts of data (Yang et al, 2015;Nguyen et al, 2015), using the whole text per document for training.…”
Section: Related Workmentioning
confidence: 99%
“…It is possible to receive multiple records for the same object from the same institution. 8 A quality control failure during the data ingestion process can let duplicates be published in the Europeana portal. Clustering allows us to identify these duplicates with a high degree of accuracy; often the exact same metadata appears in many fields.…”
Section: Qualitative Evaluation and Categorisation Of Clustersmentioning
confidence: 99%
“…In the latter case, an automatic procedure would have difficulty making the distinction with other types of relations. 8 Derivative works These are objects which are derived from another one, such as reprint. Fig.…”
Section: Qualitative Evaluation and Categorisation Of Clustersmentioning
confidence: 99%
See 1 more Smart Citation