Existing methods for automatically analyzing search logs describe search behavior on the basis of purely syntactic differences (overlapping terms) between queries. Although these statistics at a syntactic level provide valuable insights into the complexity and successfulness of search interactions, they offer a limited interpretation of the observed searching behavior, as they do not consider the semantics of users' queries. Recently, large amounts of semantic information have become publicly available in the form of linked data. In this paper we propose a method to exploit this information to enrich search queries with linked data entities so as to determine the semantic types of the queries and the relations between queries that are consecutively entered in a search session.This work provides also an in-depth analysis of the search logs of the commercial picture portal of a European news agency, which offers access to photographic images to professional users. Compared to previous image search log analyses, in particular those of professional users, we consider a much larger dataset. We analyze the logs both in the more traditional syntactic way and using the newly proposed semantic approach, and compare the results. Our findings show the benefits of using semantics for search log analysis: the identified types of query modifications cannot be appropriately analyzed with a purely statistical approach that only considers term overlap, since queries related in the most frequent ways do not usually share terms. We discuss implications of our findings for improving log analysis, image collection management, and search engine design.
Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries.We propose a framework that: i) combines both implicitly and explicitly represented subtopics; and ii) allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models.
Analysis of existing methods for automatic optimization of link structures shows that these methods rely heavily on assumptions about the preferences and navigation behavior of users. Authors often do not state these assumptions explicitly and do not evaluate whether the assumptions are consistent with the actual behavior of the users of the site. This is a serious deficiency as experiments with simulated users show that incorrect assumptions can easily lead to inefficient link structures. In this work we present a framework that gives a systematic overview of alternative assumptions. On the basis of the framework we can select a set of assumptions that best matches the navigation behavior of the users in the site's log files. We also present a method for optimizing hierarchical navigation menus on the basis of the selected assumptions. This method can be used interactively under full control of a web master. The system proposes modifications of the structure and explains why these modifications lead to more efficient menus. Evaluation by means of a case study shows that the modifications that are proposed effectively reduce the expected navigation time while preserving the coherence of the menu structure.
The workshop on Usage Analysis and the Web of Data (USEWOD2011) was the first workshop in the field to investigate combinations of usage data with semantics and the Web of Data. Questions the workshop aims to address are for example: How can semantics help in understanding usage data, how can semantic information be derived from usage data, and how can we learn about usage of and on the emerging Web of Data, and what can we learn from it? We report on the findings and results of this
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.