The purpose of this paper is to test the reliability of query intents derived from queries, either by the user who entered the query or by another juror. We report the findings of three studies:First, we conducted a large-scale classification study (approximately 50,000 queries) using a crowdsourcing approach. Then, we used click-through data from a search engine log and validated the judgments given by the jurors from the crowdsourcing study. Finally, we conducted an online survey on a commercial search engine's portal. Since we used the same queries for all three studies, we were able to compare the results and the effectiveness of the different approaches, as well. We found that neither the crowdsourcing approach using jurors who classified queries originating from other users, nor the questionnaire approach using searchers who were asked about their own query that they just entered into a web search engine, lead to satisfying results. This leads us to conclude that there is little understanding of the classification tasks, even though both groups of jurors were given detailed instructions.While we used manual classification, our research has important implications for automatic classification, as well. We must question the success of approaches using automatic classification and comparing its performance to a baseline from human jurors. Running head: DERIVING QUERY INTENTS 2Keywords: search engines, information needs, query classification, user intent, web queries, web searching Deriving Query Intents from Web Search Engine QueriesSearch engines are by far the major means to finding information on the Web. In just one month, 131 billion queries were posed to the general-purpose search engines (ComScore, 2010). Studies report that the performance of search engines in terms of retrieval effectiveness is only moderate (Bar-Ilan, Keenoy, Yaari, & Levene, 2007;Griesbaum, 2004;Véronis, 2006;Lewandowski, 2008), and search engines' responses to more complex search tasks like exploratory searches (Marchionini, 2006) are considered poor (Singer, Norbisrath, Lewandowski, Vainikko, & Kikkas, 2011). However, when considering simpler tasks such as finding home pages, the performance of search engines is quite good (Lewandowski, 2011).It is clear that first, the quality of search engines' results depends on the difficulty of the task, and second, that search engines perform better on simpler tasks. To identify task types and to provide the best suitable results for the individual tasks, it is important to identify users' intent behind their queries.In this paper, we focus on identifying the following query intents: 1.Informational, where the user aims at finding some documents on a topic he is interested in.2. Navigational, where the user aims at navigating to a web page already known or where the user at least assumes that a specific web page exists. 3.Transactional, where the user wants to find a web page where a further transaction (e.g., downloading software, playing a game) can be performed. 4.Commercial, wher...
Search engine queries are the starting point for studies in different fields, such as health or political science. These studies usually aim to make statements about social phenomena. However, the queries used in the studies are often created rather unsystematically and do not correspond to actual user behavior. Therefore, the evidential value of the studies must be questioned. We address this problem by developing an approach (query sampler) to sample queries from commercial search engines, using keyword research tools designed to support search engine marketing. This allows us to generate large numbers of queries related to a given topic and derive information on how often each keyword is searched for, that is, the query volume. We empirically test our approach with queries from two published studies, and the results show that the number of queries and total search volume could be considerably expanded. Our approach has a wide range of applications for studies that seek to draw conclusions about social phenomena using search engine queries. The approach can be applied flexibly to different topics and is relatively straightforward to implement, as we provide the code for querying Google Ads API. Limitations are that the approach needs to be tested with a broader range of topics and thoroughly checked for problems with topic drift and the role of close variants provided by keyword research tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.