Egidio Terra scite author profile

Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative study of two methods for estimating word cooccurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web data, and we study the impact of corpus size on the effectiveness of the measures. We base the evaluation on one TOEFL question set and two practice questions sets, each consisting of a number of multiple choice questions seeking the best synonym for a given target word. For two question sets, a context for the target word is provided, and we examine a number of word similarity measures that exploit this context. Our best combination of similarity measure and frequency estimation method answers 6-8% more questions than the best results previously reported for the same question sets.

show abstract

The impact of corpus size on question answering performance

Clarke

Cormack

Laszlo

et al. 2002

View full text Add to dashboard Cite

The effect of document retrieval quality on factoid question answering performance

Collins-Thompson

Callan

Terra

et al. 2004

View full text Add to dashboard Cite

Scoring missing terms in information retrieval tasks

Terra

Clarke

2004

View full text Add to dashboard Cite

An usual approach to address mismatching vocabulary problem is to augment the original query using dictionaries and other lexical resources and/or by looking at pseudo-relevant documents. Either way, terms are added to form a new query that will be used to score all documents in a subsequent retrieval pass, and as consequence the original query's focus may drift because of the newly added terms. We propose a new method to address the mismatching vocabulary problem, expanding original query terms only when necessary and complementing the user query for missing terms while scoring documents. It allows related semantic aspects to be included in a conservative and selective way, thus reducing the possibility of query drift. Our results using replacements for the missing query terms in modified document and passages retrieval methods show significant improvement over the original ones.

show abstract

Passage retrieval vs. document retrieval for factoid question answering

Clarke

Terra

2003

View full text Add to dashboard Cite

Question Answering By Passage Selection

Clarke

Cormack

Lynam

et al. 2008

View full text Add to dashboard Cite

Fast computation of lexical affinity models

Terra

Clarke

2004

View full text Add to dashboard Cite

We present a framework for the fast computation of lexical affinity models. The framework is composed of a novel algorithm to efficiently compute the co-occurrence distribution between pairs of terms, an independence model, and a parametric affinity model. In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in the corpus. The framework is flexible, allowing fast adaptation to applications and it is scalable. We apply it in combination with a terabyte corpus to answer natural language tests, achieving encouraging results.

show abstract

The impact of corpus size on question answering performance

Clarke

Cormack

Laszlo

et al. 2002

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Egidio Terra

Frequency estimates for statistical word similarity measures

The impact of corpus size on question answering performance

The effect of document retrieval quality on factoid question answering performance

Scoring missing terms in information retrieval tasks

Passage retrieval vs. document retrieval for factoid question answering

Question Answering By Passage Selection

Fast computation of lexical affinity models

The impact of corpus size on question answering performance

Contact Info

Product

Resources

About