The literature shows that there are many problems with enterprise document search. Studies reveal that typical knowledge workers spend between 10% and 20% of their time searching for documents they never find. While many argue that metadata can improve enterprise document search, in reality few organizations use metadata. This represents a missed opportunity. This article describes the results of two experiments that use simulation to evaluate the actual impact of metadata on the costs and benefits of enterprise search. The first study provides quantitative evidence of the increase in recall and precision that stems from the use of metadata-enhanced document searches. The second study demonstrates that simple metadata structures can be nearly as effective as complex ones, implying that the cost of creating and maintaining metadata is likely to be lower than generally thought. This is the first study to provide explicit quantitative evidence of the gains that can be achieved from the use of metadata, and one of only a handful of studies that examines the cost of creating and maintaining metadata.
This paper presents a general method for interactively searching for objects (alternatives) in a large collection the contents of which are unknown to the user and where the objects are defined by a large number of discrete-valued attributes. Briefly, the method presents an object and asks the user to indicate his or her preference for the object. The method allows preference indications in two basic modes: (1) by assignment of objects to predefined preference categories such as high, medium, and low preference or (2) by direct preference comparison of objects such as “object A preferred to object B.” From these preference statements, the method learns about the user's preferences and constructs an approximation to a value or preference function of the user (additive or multiplicative) at each iteration. It then uses this approximate preference function to rerank the objects in the collection and retrieve the top-ranked ones to present to the user at the next iteration. The process terminates when the user is satisfied with the list of top-ranked objects. This method can also be used to solve general multiattribute discrete alternative problems, where the alternatives are known with certainty and described by a set of discrete-valued attributes. Test results are reported and application possibilities are discussed.
Keyword search has failed to adequately meet the needs of enterprise users. This is largely due to the size of document stores, the distribution of word frequencies, and the indeterminate nature of languages. The authors argue a different approach needs to be taken, and draw on the successes of dimensional data modeling and subject indexing to propose a solution. They test our solution by performing search queries on a large research database. By incorporating readily available subject indexes into the search process, they obtain order of magnitude improvements in the performance of search queries. Their performance measure is the ratio of the number of documents returned without using subject indexes to the number of documents returned when subject indexes are used. The authors explain why the observed tenfold improvement in search performance on our research database can be expected to occur for searches on a wide variety of enterprise document stores
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.