The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from tagged sites on the social bookmarking site del.icio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for "popular" sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used analyze the meaning of particular tags given their relationship to other tags.
Abstract. In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ongoing discussion about its use, and potential misuse, particularly with regards to interactions with inference. In fact, owl:sameAs can be viewed as encoding only one point on a scale of similarity, one that is often too strong for many of its current uses. We describe how referentially opaque contexts that do not allow inference exist, and then outline some varieties of referentially-opaque alternatives to owl:sameAs. Finally, we report on an empirical experiment over randomly selected owl:sameAs statements from the Web of data. This theoretical apparatus and experiment shed light upon how owl:sameAs is being used (and misused) on the Web of data.
This article uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users.First, we study the formation of stable distributions in tagging systems, seen as an implicit form of "consensus" reached by the users of the system around the tags that best describe a resource. We show that final tag frequencies for most resources converge to power law distributions and we propose an empirical method to examine the dynamics of the convergence process, based on the Kullback-Leibler divergence measure. The convergence analysis is performed for both the most utilized tags at the top of tag distributions and the so-called long tail.Second, we study the information structures that emerge from collaborative tagging, namely tag correlation (or folksonomy) graphs. We show how community-based network techniques can be used to extract simple tag vocabularies from the tag correlation graphs by partitioning them into subsets of related tags. Furthermore, we also show, for a specialized domain, that shared vocabularies produced by collaborative tagging are richer than the vocabularies which can be extracted from large-scale query logs provided by a major search engine.14:2 • V. Robu et al.Although the empirical analysis presented in this article is based on a set of tagging data obtained from del.icio.us, the methods developed are general, and the conclusions should be applicable across other websites that employ tagging. ACM Reference Format:Robu, V., Halpin, H., and Shepherd, H. 2009. Emergence of consensus and shared vocabularies in collaborative tagging systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.