Abstract. In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time-consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter annotator agreement. However, some questions arise in practice: what is the quality of the automatically produced annotations? How do they compare with manual annotations and with the requirements for annotation that were defined in the archive? If different from the manual annotations, are the automatic annotations wrong? In the CHOICE project, partially hosted at the Netherlands Institute for Sound and Vision, the Dutch public archive for audiovisual broadcasts, we automatically generate annotation suggestions for cataloguers. In this paper, we define three types of evaluation of these annotation suggestions: (1) a classic and strict precision/recall measure expressing the overlap between automatically generated keywords and the manual annotations, (2) a loosened precision/recall measure for which semantically very similar annotations are also considered as relevant matches, (3) an in-use evaluation of the usefulness of manual versus automatic annotations in the context of Serendipitous Browsing. During serendipitous browsing the annotations (manual or automatic) are used to retrieve and visualize semantically related documents. ContextThe Netherlands Institute for Sound and Vision (henceforth S&V) is in charge of archiving publicly broadcasted TV and radio programs in the Netherlands. Two years ago the audiovisual production and archiving environment changed from analogue towards digital data. This effectively quadrupled the inflow of archival material and as such the amount of work for cataloguers. The two most important customer groups are: 1) professional users from the public broadcasters and 2) users from science and education. These typically have three kinds of user queries:
In many cases keywords from a restricted set of possible keywords have to be assigned to texts. A common way to find the best keywords is to rank terms occurring in the text according to their tf.idf value. This requires a corpus of texts from which document frequencies can be derived. In this paper we show that we can obtain results of the same quality without the usage of a background corpus, using relations between terms provided in a thesaurus.
Abstract. In this article we report on a user study aimed at evaluating and improving a thesaurus browser. The browser is intended to be used by documentalists of a large public audio-visual archive for finding appropriate indexing terms for TV programs. The subjects involved in the study were documentalists of the institutions involved. The study provides insight into the value of various thesaurus browsing and searching techniques.
In this paper, we propose a knowledge discovery-based approach to ontology concept design. In our approach, concept design is a stepwise activity which exploits ontology matching techniques in order to retrieve useful external concepts semantically related to the design at hand. This way, the resulting ontology knowledge space is open towards external knowledge sources, by complementing the ontology expert knowledge with domain knowledge stored in other external sources, such as other domain ontologies, web directories, and, in general, the semantic web.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.