Luit Gazendam scite author profile

Interdisciplinary Science Reviews

Malaisé

et al. 2009

Abstract. In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time-consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter annotator agreement. However, some questions arise in practice: what is the quality of the automatically produced annotations? How do they compare with manual annotations and with the requirements for annotation that were defined in the archive? If different from the manual annotations, are the automatic annotations wrong? In the CHOICE project, partially hosted at the Netherlands Institute for Sound and Vision, the Dutch public archive for audiovisual broadcasts, we automatically generate annotation suggestions for cataloguers. In this paper, we define three types of evaluation of these annotation suggestions: (1) a classic and strict precision/recall measure expressing the overlap between automatically generated keywords and the manual annotations, (2) a loosened precision/recall measure for which semantically very similar annotations are also considered as relevant matches, (3) an in-use evaluation of the usefulness of manual versus automatic annotations in the context of Serendipitous Browsing. During serendipitous browsing the annotations (manual or automatic) are used to retrieve and visualize semantically related documents. ContextThe Netherlands Institute for Sound and Vision (henceforth S&V) is in charge of archiving publicly broadcasted TV and radio programs in the Netherlands. Two years ago the audiovisual production and archiving environment changed from analogue towards digital data. This effectively quadrupled the inflow of archival material and as such the amount of work for cataloguers. The two most important customer groups are: 1) professional users from the public broadcasters and 2) users from science and education. These typically have three kinds of user queries:

Thesaurus Based Term Ranking for Keyword Extraction

Brussee

2010

In many cases keywords from a restricted set of possible keywords have to be assigned to texts. A common way to find the best keywords is to rank terms occurring in the text according to their tf.idf value. This requires a corpus of texts from which document frequencies can be derived. In this paper we show that we can obtain results of the same quality without the usage of a background corpus, using relations between terms provided in a thesaurus.

Evaluating a Thesaurus Browser for an Audio-visual Archive

Malaisé

Aroyo

Brugman

et al. 2006

Abstract. In this article we report on a user study aimed at evaluating and improving a thesaurus browser. The browser is intended to be used by documentalists of a large public audio-visual archive for finding appropriate indexing terms for TV programs. The subjects involved in the study were documentalists of the institutions involved. The study provides insight into the value of various thesaurus browsing and searching techniques.

Enhancing Ontology Concept Design by Knowledge Discovery

Brussee

et al. 2007

In this paper, we propose a knowledge discovery-based approach to ontology concept design. In our approach, concept design is a stepwise activity which exploits ontology matching techniques in order to retrieve useful external concepts semantically related to the design at hand. This way, the resulting ontology knowledge space is open towards external knowledge sources, by complementing the ontology expert knowledge with domain knowledge stored in other external sources, such as other domain ontologies, web directories, and, in general, the semantic web.

Apolda: A Practical Tool for Semantic Annotation

Brussee

et al. 2007