Annotation-based techniques for image retrieval suffer from sparse and short image textual descriptions. Moreover, users are often not able to describe their needs with the most appropriate keywords. This situation is a breeding ground for a vocabulary mismatch problem resulting in poor results in terms of retrieval precision. In this paper, we propose a query expansion technique for queries expressed as keywords and short natural language descriptions. We present a new massive query expansion strategy that enriches queries using a graph knowledge base by identifying the query concepts, and adding relevant synonyms and semantically related terms. We propose a topological graph enrichment technique that analyzes the network of relations among the concepts, and suggests semantically related terms by path and community detection analysis of the knowledge graph. We perform our expansions by using two versions of Wikipedia as knowledge base achieving improvements of the system's precision up to more than 27%.
The use of synthetic graph generators is a common practice among graph-oriented benchmark designers, as it allows obtaining graphs with the required scale and characteristics. However, finding a graph generator that accurately fits the needs of a given benchmark is very difficult, thus practitioners end up creating ad-hoc ones. Such a task is usually timeconsuming, and often leads to reinventing the wheel. In this paper, we introduce the conceptual design of DataSynth, a framework for property graphs generation with customizable schemas and characteristics. The goal of DataSynth is to assist benchmark designers in generating graphs efficiently and at scale, saving from implementing their own generators. Additionally, DataSynth introduces novel features barely explored so far, such as modeling the correlation between properties and the structure of the graph. This is achieved by a novel property-to-node matching algorithm for which we present preliminary promising results.
Knowledge bases are very good sources for knowledge extraction, the ability to create knowledge from structured and unstructured sources and use it to improve automatic processes as query expansion. However, extracting knowledge from unstructured sources is still an open challenge. In this respect, understanding the structure of knowledge bases can provide significant benefits for the effectiveness of such purpose. In particular, Wikipedia has become a very popular knowledge base in the last years because it is a general encyclopedia that has a large amount of information and thus, covers a large amount of different topics. In this piece of work, we analyze how articles and categories of Wikipedia relate to each other and how these relationships can support a query expansion technique. In particular, we show that the structures in the form of dense cycles with a minimum amount of categories tend to identify the most relevant information
The search for relevant information can be very frustrating for users who, unintentionally, use too general or inappropriate keywords to express their requests. To overcome this situation, query expansion techniques aim at transforming the user request by adding new terms, referred as expansion features, that better describe the real intent of the users. We propose a method that relies exclusively on relevant structures (as opposed to the use of semantics) found in knowledge bases (KBs) to extract the expansion features. We call our method Structural Query Expansion (SQE). The structural analysis of KBs takes us to propose a set of structural motifs that connect their strongly related entries, which can be used to extract expansion features. In this paper we use Wikipedia as our KB, which is probably one of the largest sources of information. SQE is capable of achieving more than 150% improvement over non expanded queries and is able to identify the expansion features in less than 0.2 seconds in the worst case scenario. Most significantly, we believe that we are contributing to open new research directions in query expansion, proposing a method that is orthogonal to many current systems. For example, SQE improves pseudo-relevance feedback techniques up to 13%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.