In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two documents use different collections of core words to represent the same topic, they may be falsely assigned to different clusters due to the lack of shared core words, although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited, even for WordNet or Mesh, (2) using ontology terms as replacement or additional features may cause information loss, or introduce noise. In this paper, we present a novel text clustering method to address these two issues by enriching document representation with Wikipedia concept and category information. We develop two approaches, exact match and relatedness-match, to map text documents to Wikipedia concepts, and further to Wikipedia categories. Then the text documents are clustered based on a similarity metric which combines document content information, concept information as well as category information. The experimental results using the proposed clustering framework on three datasets (20-newsgroup, TDT2, and LA Times) show that clustering performance improves significantly by enriching document representation with Wikipedia concepts and categories.
This study assesses the current state of responsibilities and skill sets required of cataloging professionals. It identifies emerging roles and competencies focusing on the digital environment and relates these to the established knowledge of traditional cataloging standards and practices. We conducted a content analysis of 349 job descriptions advertised in AutoCAT in 2005-2006. Multivariate techniques of cluster and multidimensionalscaling analyses were applied to the data. Analysis of job titles, required and preferred qualifications/skills, and responsibilities lends perspective to the roles that cataloging professionals play in the digital environment. Technological advances increasingly demand knowledge and skills related to electronic resource management, metadata creation, and computer and Web applications. Emerging knowledge and skill sets are increasingly being integrated into the core technical aspects of cataloging such as bibliographic and authority control and integrated library-system management. Management of cataloging functions is also in high demand.The results of the study provide insight on current and future curriculum design of library and information-science programs.
Social tagging, as a recent approach for creating metadata, has caught the attention of library and information science researchers. Many researchers recommend incorporating social tagging into the library environment and combining folksonomies with formal classification. However, some researchers are concerned with the quality issues of social annotation because of its uncontrolled nature. In this study, we compare social tags created by users from the LibraryThing website with the subject terms assigned by experts according to the Library of Congress Subject Headings (LCSH). The purpose of this study is to examine the difference and connections between social tags and expert-assigned subject terms and further explore the feasibility and obstacles of implementing social tagging in library systems. The results of our study show that it is possible to use social tags to improve the accessibility of library collections. However, the existence of non-subject-related tags may impede the application of social tagging in traditional library cataloguing systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.