Abstract-Gnutella peers independently choose the way in which objects are named as well as queried. Using a long term analysis of the files shared and queries issued, we show that this flexibility leads to a mismatch between the way that objects were named and the way that users were issuing search queries. Thirty percent of the failed queries contained keywords that were not present in any file name while the remaining queries failed because no file name contained all the keywords in a particular query. Our earlier analysis of files shared in the popular iTunes music file sharing system showed that standardizing the file names to make them easier to search is not a viable alternative. Instead, we transform the queries to better match the objects available in the system. We investigated spell correction (using file name information from the neighborhood) as well as remove query keywords. We consider the results from the transformed query to be relevant to the intent of the original query if the transformed query used many of the original keywords and the number of matching files closely matched the number of matches for typical successful queries. Our approach is practical and uses information available within the immediate neighborhood of an ultra-peer. An overlay agnostic analysis shows that our transformation improves success rates from 45% to between 72.5% and 91.2%. Using our Hybrid mechanism as a Gnutella middleware, our transformation produced relevant results for about 61% of the failed queries.
Keywords-unstructured peer-to-peer, query transformationI. INTRODUCTION Gnutella is popular and accounted for over 40% of all P2P users [1]. However, its search performance [2] remains poor; between 2003 and 2006, the query success rate at a forwarding peer only increased from 3.5% to 6.9% [3].In general, the performance of unstructured P2P systems depend on the network topology, search mechanisms that route queries over the overlay, annotations of shared objects as well as in the way that users request objects. A large body of prior research efforts (including [4], [5], [6]) investigated various overlay maintenance and search mechanisms that reduced the network overhead for finding shared objects.Earlier [7], we investigated the object annotations and user generated queries on the Gnutella network. Note that objects in Gnutella were independently annotated by the content provider without any global coordination. We showed that over 44% of the queries had no matching objects regardless of the overlay or search mechanism used to locate the objects. In this paper, we focus on these failed queries.Gnutella [8] requires a match of all query terms against terms in the particular file name. We show that 30% of the