Abstract:For document ordering and classification to advance toward optimal manual and automated systems, it is necessary for a science of document or library classification to be developed. Seven questions are posed that the author feels must be addressed, if not answered, before optimal systems can be developed. Suggestions are made as to the form that answers to these questions might take.
“…The relationship between the amount of information in a document, or any other type of informational object, and the amount of metainformation available, that is, the metadata that describes the informational object, is key for information professionals, especially those associated with the design and application of metainformation systems through indexing and cataloging in the library and information and knowledge professionals (Gnoli 2012;Losee 1993;Smiraglia & van den Heuvel 2013). Understanding the usefulness of different combinations of in-formation and metainformation is the focus of the discussion below.…”
Carolina at Chapel Hill. His interests are in organizing information, information retrieval, and the study of information and knowledge. His most recent book, Information From Processes: About the Nature of Information Creation, Use, and Representation, addresses the nature of information and knowledge, providing a precise definition for both. He has a strong interest in both Library Science and Information Science and tries to bring the two together whenever possible.
“…The relationship between the amount of information in a document, or any other type of informational object, and the amount of metainformation available, that is, the metadata that describes the informational object, is key for information professionals, especially those associated with the design and application of metainformation systems through indexing and cataloging in the library and information and knowledge professionals (Gnoli 2012;Losee 1993;Smiraglia & van den Heuvel 2013). Understanding the usefulness of different combinations of in-formation and metainformation is the focus of the discussion below.…”
Carolina at Chapel Hill. His interests are in organizing information, information retrieval, and the study of information and knowledge. His most recent book, Information From Processes: About the Nature of Information Creation, Use, and Representation, addresses the nature of information and knowledge, providing a precise definition for both. He has a strong interest in both Library Science and Information Science and tries to bring the two together whenever possible.
“…A linear classification system is proposed that has the capability to order existing media fragments for the display of related fragments, approximating many of the characteristics of hypermedia systems. Organization facilitates browsing, allowing users to find items of which they were unaware when they began the search (Baker, 1986;Boll, 1985;Cover & Walsh, 1988;Huestis, 1988;Losee, 1993b;Marchionini, 1987;Morse, 1970).…”
Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly once. Systems organizing documents based on weighted and unweighted Gray codes are examined. Relevance feedback is used to conceptually organize the collection for an individual to browse, based on that individual's interests and information needs, as reflected by their relevance judgements and user supplied economic preferences. We apply Bayesian learning theory to estimating the characteristics of documents of interest to the user and supply an analytic model of browsing performance, based on minimizing the Expected Browsing Distance (EBD). Economic feedback may be used to change the ordering of documents to benefit the user. Using these techniques, a hypermedia or digital library may order any and all available documents, not just those examined, based on the information provided by the searcher or people with similar interests.
“…Libraries have long brought books on similar topics together, this being a consideration in the development of many classification systems, such as the various Dewey decimal systems that are used throughout the world (Losee, 1993;Foskett, 1996). By placing similar items near each other in a library, browsing is improved, but not made perfect.…”
When do information retrieval systems using two document clusters provide better retrieval performance than systems using no clustering? We answer this question for one set of assumptions and suggest how this may be studied with other assumptions. The "Cluster Hypothesis" asks an empirical question about the relationships between documents and user-supplied relevance judgments, while the "Cluster Performance Question" proposed here focuses on the when and why of information retrieval or digital library performance for clustered and unclustered text databases. This may be generalized to study the relative performance of m versus n clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.