A comparison of collocation-based similarity measures in query expansion

Kim, Myoung-Cheol; Choi, Key‐Sun

doi:10.1016/s0306-4573(98)00040-5

Cited by 62 publications

(24 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another popular statistical method for choosing expansion terms uses the concept of co-occurrence. For a certain term x in a user query, terms that frequently co-occur with x in the document base are shown to be excellent query expansion candidates (Kim and Choi, 1999). This technique is also evident in web search engines in the form of query suggestions provided either while eliciting the query or after an initial search has been conducted.…”

Section: Query Adaptationmentioning

confidence: 93%

A comparative survey of Personalised Information Retrieval and Adaptive Hypermedia techniques

Steichen

Ashman

Wade

2012

Information Processing & Management

View full text Add to dashboard Cite

Abstract.A key driver for next generation web information retrieval systems is becoming the degree to which a user's search and presentation experience is adapted to individual user properties and contexts of use. Over the past decades, two parallel threads of personalisation research have emerged, one originating in the document space in the area of Personalised Information Retrieval (PIR) and the other arising from the hypertext space in the field of Adaptive Hypermedia (AH).PIR typically aims to bias search results towards more personally relevant information by modifying traditional document ranking algorithms. Such techniques tend to represent users with simplified personas (often based on historic interests), enabling the efficient calculation of personalised ranked lists. On the other hand, the field of Adaptive Hypermedia (AH) has addressed the challenge of biasing content retrieval and presentation by adapting towards multiple characteristics. These characteristics, more typically called personalisation "dimensions", include user goals or prior knowledge, enabling adaptive and personalised result compositions and navigations.The question arises as to whether it is possible to provide a comparison of PIR and AH, where the respective strengths and limitations can be exposed, but also where potential complementary affordances can be identified. This survey investigates the key techniques and impacts in the use of PIR and AH technology in order to identify such affordances and limitations. In particular, the techniques are analysed by examining key activities in the retrieval process, namely (i) query adaptation, (ii) adaptive retrieval and (iii) adaptive result composition and presentation. In each of these areas, the survey identifies individual strengths and limitations. Following this comparison of techniques, the paper also illustrates an example of a potential synergy in a hybridised approach, where adaptation can be tailored in different aspects of PIR and AH systems. Moreover, the concerns resulting from interdependencies and the respective tradeoffs of techniques are discussed, along with potential future directions and remaining challenges.

show abstract

Section: Query Adaptationmentioning

confidence: 93%

A comparative survey of Personalised Information Retrieval and Adaptive Hypermedia techniques

Steichen

Ashman

Wade

2012

Information Processing & Management

View full text Add to dashboard Cite

show abstract

“…One of them is relevance feedback using the returned results and adding new terms related to the original query and selected documents [44]. Other methods include adding relevant terms based on term frequency, document frequency from top ranked documents [45], [46], co-occurrence based techniques [47], thesaurus based techniques [48][49][50][51], desktop specific techniques [7], probability of terms over search logs [52]. Our approach uses a user intention based keyword addition to expand the original query to handle ambiguous query terms.…”

Section: Query Expansionmentioning

confidence: 99%

Context Sensitive Search String Composition Algorithm using User Intention to Handle Ambiguous Keywords

Gajendragadkar

Joshi²

2017

IJECE

View full text Add to dashboard Cite

Finding the required URL among the first few result pages of a search engine is still a challenging task. This may require number of reformulations of the search string thus adversely affecting user's search time. Query ambiguity and polysemy are major reasons for not obtaining relevant results in the top few result pages. Efficient query composition and data organization are necessary for getting effective results. Context of the information need and the user intent may improve the autocomplete feature of existing search engines. This research proposes a Funnel Mesh-5 algorithm (FM5) to construct a search string taking into account context of information need and user intention with three main steps 1) Predict user intention with user profiles and the past searches via weighted mesh structure 2) Resolve ambiguity and polysemy of search strings with context and user intention 3) Generate a personalized disambiguated search string by query expansion encompassing user intention and predicted query. Experimental results for the proposed approach and a comparison with direct use of search engine are presented. A comparison of FM5 algorithm with K Nearest Neighbor algorithm for user intention identification is also presented. The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention. Results are presented for English language dataset as well as Marathi (an Indian language) dataset of ambiguous search strings. Keyword: Autocompletion Context Data mining Search User intention Copyright © 2017 Institute of Advanced Engineering and Science.All rights reserved. Corresponding Author:Uma Gajendragadkar, COEP, Phone +919822479128, G7/9 Omkar Garden, Manikbaug, Pune, Maharshtra, India. Email: umagadkar@gmail.com INTRODUCTIONCurrent search engines churn a large volume of data to obtain meaningful information; however, the main challenge is to get relevant results in the top few result pages [1], [2]. Search engines check for the presence of keywords in documents. Mere presence of keywords in a document may not match the user's search intention and need. User satisfaction increases when more relevant and exact information is presented in the top few results. An appropriately composed query is the starting point for handling this challenge [3]. Performance of search engines can be improved with the use of appropriate keywords or prediction of such keywords [4][5][6]. Search engines use search logs and most popular queries; however, these are not sufficient to predict the user's interests or intention [7].Users are of three types, first -Internet skilled users, second -Internet aware users and thirdInternet unskilled users. Many times, users do not know the proper keywords for searching information and they cannot express their information need or intent of search [8], [9]. This results in search results often not satisfying user's information need. This problem can be addressed by query expansion and reformulation [3]. Search engines provide a...

show abstract

“…The set-based similarities [14,18] represent records as sets of tokens and estimates the similarity of records by estimating the similarity of their token sets. Given two token setss,r ⊂ T we can estimate their similarity by assigning values tos,r,s ∪r ands ∩r, and then combining these values into a final similarity score.…”

Section: Definitions Of Pairwise Record Operationsmentioning

confidence: 99%

“…The first class exploits the connection between set operations and the conjunction and disjunction functions in Proposition 2 to obtain consistent extensions of set-similarity measures, such as Jaccard or Dice, to the vector space model. We refer to, e.g., [18] or [14] for the set-based definitions of these measures.The second class uses the p-norm of a record to define distance-based similarity functions.…”

Section: Generalized Similarity Functionsmentioning

confidence: 99%

An extended vector space model for information retrieval with generalized similarity measures : theory and applications.

Paskaleva¹,

Bochev²,

Ames³

2012

View full text Add to dashboard Cite

We present an Extended Vector Space Model (EVSM) for information retrieval, endowed with a new set of similarity functions. Our model considers records as multisets of tokens. A token weight function maps records into a real vectors. Using this vector representation we define a p-norm of a record and pairwise conjunction and disjunction operations on records. These operations prompt consistent extensions of published set-based similarity functions and yield new p distance-based similarities. We demonstrate that some well-known similarities form a subset of the new functions resulting from particular choices of token weights and p-values. In so doing, 3 we establish the equivalence of the corresponding information retrieval models with a properly augmented vector space model. The performance of the extended similarity measures is compared by solving an entity matching (EM) problem for two types of benchmark datasets. Among other things, our results show that the new similarity functions perform particularly well on tasks involving matching of records by keywords.The EVSM served as foundation for mathematically rigorous definition of EM problem. We developed a supervised EM framework that interprets the EM as the combinatorial optimization problem of finding the maximum weight matching in a weighted bipartite graph connecting records from two databases, also known as Linear Sum Assignment Problem (LSAP). Casting of EM problems into LSAP offers valuable practical and theoretical advantages. There are efficient algorithms that solve LSAP in polynomial time. Availability of such algorithms reduces the task of solving the EM problem to computing weights for the edges of the bipartite graph connecting the records from the databases. This allowed focusing efforts on the development of robust and flexible methodologies for the estimation of the similarity between records and led to the notion of an optimal similarity function (OSF) for MMIR problems. The OSF is sought as a linear combination of similarity functions for the common relation attributes. Solution of a suitably defined quadratic program using training data defines the weights in the linear combination. Computational studies using the Abt-Buy e-commerce set and publication databases comprising of research articles in cloud computing, antennas and information retrieval areas confirm the robustness of our approach.4

show abstract

A comparison of collocation-based similarity measures in query expansion

Cited by 62 publications

References 15 publications

A comparative survey of Personalised Information Retrieval and Adaptive Hypermedia techniques

A comparative survey of Personalised Information Retrieval and Adaptive Hypermedia techniques

Context Sensitive Search String Composition Algorithm using User Intention to Handle Ambiguous Keywords

An extended vector space model for information retrieval with generalized similarity measures : theory and applications.

Contact Info

Product

Resources

About