Folksonomies offer an easy method to organize information in the current Web. This fact and their collaborative features have derived in an extensive involvement in many Social Web projects. However they present important drawbacks regarding their limited exploring and searching capabilities, in contrast with other methods as taxonomies, thesauruses and ontologies. One of these drawbacks is an effect of its flexibility for tagging, producing frequently multiple syntactic variations of a same tag. In this chapter we study the application of two classical pattern matching techniques, Levenshtein distance for the imperfect string matching and Hamming distance for the perfect string matching, to identify syntactic variations of tags.
Folksonomies have emerged as a common way of annotating and categorizing content using a set of tags that are created and managed in a collaborative way. Tags carry the semantic information within a folksonomy, and provide thus the link to ontologies. The appeal of folksonomies comes from the fact that they require a low effort for creation and maintenance since they are community-generated. However they present important drawbacks regarding their limited navigation and searching capabilities, in contrast with other methods as taxonomies, thesauruses and ontologies. One of these drawbacks is an effect of its flexibility for tagging, producing frequently multiple syntactic variations of a same tag. Similarity measures allow the correct identification of tag variations when tag lengths are greater than five symbols. In this paper we propose the use of cosine relatedness measures in order to cluster tags with lengths lower or equal than five symbols. We build a discriminator based on the combination of a fuzzy similarity and a cosine measures and we analyze the results obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.