P. C. N. Wong scite author profile

In information retrieval, it is common to model index terms and documents as vectore in a suitably defined vector space. The main di]ficulty with this approach is that the explicit repreeentation of term vectors is not known a priorL For th~ mason, the vector space model adopted by Salton for the SMART system treats the terms as a set of orthogonal vectom In such a model it is often necessary to adopt a separate, corrective procedure to take into account the correlations between terms. In this paper, we propose a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme. We also demonstrate how such correlations can be included with minimal modification in the existing vector based information retrieval systems. The preliminary experimental . results obtained from the new model are very encouraging.

show abstract

On modeling of information retrieval concepts in vector spaces

Wong

Ziarko

Raghavan

et al. 1987

ACM Trans. Database Syst.

115

View full text Add to dashboard Cite

The Vector Space Model (VSM) has been adopted in information retrieval as a means of coping with inexact representation of documents and queries, and the resulting difficulties in determining the relevance of a document relative to a given query. The major problem in employing this approach is that the explicit representation of term vectors is not known a priori. Consequently, earlier researchers made the assumption that the vectors corresponding to terms are pairwise orthogonal. Such an assumption is clearly unrealistic. Although attempts have been made to compensate for this assumption by some separate, corrective steps, such methods are ad hoc and, in most cases, formally inconsistent. In this paper, a generalization of the VSM, called the GVSM, is advanced. The developments provide a solution not only for the computation of a measure of similarity (correlation) between terms, but also for the incorporation of these similarities into the retrieval process. The major strength of the GVSM derives from the fact that it is theoretically sound and elegant. Furthermore, experimental evaluation of the model on several test collections indicates that the performance is better than that of the VSM. Experiments have been performed on some variations of the GVSM, and all these results have also been compared to those of the VSM, based on inverse document frequency weighting. These results and some ideas for the efficient implementation of the GVSM are discussed.

show abstract

On extending the vector space model for Boolean query processing

Wong

Ziarko²,

Raghavan³

et al. 1986

View full text Add to dashboard Cite

An infamation retrieval model, named the Generaliied Vectm Spice Model (GVSM). is extended m handle situations where queries are specitied as (extended) Boolean expressions. It is shown tbat this unified model, unlike currently available alternatives, has the advantage of inwrpating tetm cortelations inm the retrieval process. 'Ilte query language extension is attractive in the sense that most of the aIgebraic properties of tbe strict Boolean language are still preserved. Although the experimental results for extended Boolean retrieval are not always better than the vector processing method, the developments here am signiecant in facilitating commercially available retrieval systems to benefit from the vector based methods. The proposed scheme is compared m the pnorm model advanced by Salmn snd coworkers. An important conclusion is that it is desirable m investigate further extensions that can offer the benefits of both proposals.

show abstract

Extended Boolean query processing in the generalized vector space model

Wong

Ziarko

Raghavan

et al. 1989

Information Systems

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

P. C. N. Wong

Generalized vector spaces model in information retrieval

On modeling of information retrieval concepts in vector spaces

On extending the vector space model for Boolean query processing

Extended Boolean query processing in the generalized vector space model

Contact Info

Product

Resources

About