We compare standard global IR searching with user-centric localized techniques to address the database selection problem. We conduct a series of experiments to compare the retrieval effectiveness of three separate search modes applied to a hierarchically structured data environment of textual database representations. The data environment is represented as a tree-like directory containing over 15,000 unique databases and over 100,000 total leaf nodes. Our search modes consist of varying degrees of browse and search, from a global search at the root node to a refined search at a subnode using dynamically-calculated inverse document frequencies (idf s) to score candidate databases for probable relevance. Our findings indicate that a browse and search approach that relies upon localized searching from sub-nodes is capable of producing the most effective results. INTRODUCTIONThe continued growth of online databases has made the work of finding the most relevant collections increasingly difficult. Until recently, the ability to execute a 'search' in a database directory as well as 'drill down' into its hierarchical structure have largely been regarded as separate activities. If either approach does not provide desired results, large numbers of users exit online systems with unmet information needs. Yahoo! and the Open Directory Project are exceptions that permit integrated browse and search. Research has begun to explore categorization and retrieval in such environments [4]. We hypothesized that if users could first browse to a potentially relevant sub-node in a large directory, results from a search in the sub-directory would be more precise than results from a search in the entire directory. To test the effectiveness of browse plus search functionality, we designed and conducted a series of experiments on three search modes. Using the same set of real user queries, these search modes included: (1) a global search of the directory from the root node, (2) a localized search of the relevant sub-directories using global idfs, and (3) a localized search of the relevant sub-directories using the appropriate dynamically-calculated local idfs.
In this work, we compare standard global IR searching with more localized techniques to address the database selection problem. We conduct a series of experiments to compare the retrieval effectiveness of three separate search modes using a hierarchically structured data environment of textual database representations. The data environment is represented as a tree-like structure containing over 15,000 unique databases and approximately 100,000 total leaf nodes. The search modes consist of varying degrees of browse and search, from a global search at the root node to a refined search at a sub-node using dynamically-calculated inverse document frequencies (idfs) to score the candidate databases for probable relevance. Our findings indicate that a browse plus search approach that relies upon localized searching from sub-nodes in this environment produces the most effective results. IntroductionThe continued growth of online databases has made the work of finding the most relevant databases increasingly challenging. Until recently, the ability to search a metadata repository as well as 'drill down' into its hierarchical structure, e.g., as in a data directory, have largely remained separate activities. That is, browse and search tasks in the same repository have often been presented as mutually exclusive. As a result, large numbers of users exit online systems with unmet information needs when failing to find relevant sources of interest. This was the case with the Westlaw (Database) Directory. We hypothesized that if users could first browse to a potentially relevant subdirectory in the large directory, results from a search in the sub-directory would be more precise than results from a search on the entire directory. To test the effectiveness of browse plus search functionality, we designed and conducted a series of experiments on three search-modes, using the same set of real user queries. These search-modes include (1) a global search of the directory from the root node, (2) a localized search of the relevant sub-directories using global idfs,' and (3) a localized search of the relevant sub-directories using the appropriate local idfs. In the next section we review related work. Section 3 briefly describes our operational environment while section 4 discusses the underlying data. Section 5 describes the user queries harnessed for this investigation. Section 6 addresses the particular t r i d f scoring algorithm used. Our experiments are outlined in section 7 and our results are presented in section 8. In section 9 we draw our conclusions and in section 10 we mention future applications of this browse and search technology. Previous WorkAn appreciable body of work has focused on searching distributed databases of textual documents for relevant information in response to user queries (Gravano 1994; Callan, 1995;Yuwono, 1997;French 1999). Yet such fully automated retrieval and the corpus of related research which followed have been performed independent of additional user involvement. For this reason, the IR commu...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.