Information Retrieval as Statistical Translation

Berger, Adam; Lafferty, John

doi:10.1145/3130348.3130371

Cited by 183 publications

(55 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The language model approach can, however, be extended to incorporate more general notions of relevance. Berger and Lafferty, 1999, show how a language modeling approach based on machine translation provides a basis for handling synonymy and polysemy. Hofmann, 1999, describes how mixture models based on latent classes can represent documents and queries.…”

Section: Language Modelsmentioning

confidence: 99%

“…They describe this combination using a Hidden Markov Model with states that represent a unigram language model (È´Û µ), a bigram language model (È´Û Ò Û Ò ½ µ), and a model of general English (È´Û Ò Ð × µ), and mentions other generation processes such as a synonym model and a topic model. Hofmann, 1999, andBerger andLafferty, 1999 also describe the generation process using mixture models, but with different approaches to representation. Put simply, incorporating a new representation into the language model approach to retrieval involves estimating the language model (probability distribution) for the features of that representation and incorporating that new model into the overall mixture model.…”

Section: Language Modelsmentioning

confidence: 99%

“…By comparing and analyzing previous research using the terminology of classifier combination and inference nets, we hope to improve this situation. We will also describe how a new approach to probabilistic retrieval based on language models (Ponte and Croft, 1998;Miller et al, 1999;Berger and Lafferty, 1999) provides mechanisms for representing and combining sources of evidence for IR, and that this approach can be integrated with the inference net model to provide an improved framework for combination.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Combining Approaches to Information Retrieval

Croft

The Information Retrieval Series

117

135

View full text Add to dashboard Cite

The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the "meta-search" engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.

show abstract

Section: Language Modelsmentioning

confidence: 99%

Section: Language Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Combining Approaches to Information Retrieval

Croft

The Information Retrieval Series

117

135

View full text Add to dashboard Cite

show abstract

“…TRLM has yielded the better performance than the traditional methods, such as VSM, Okapi and LM in the question retrieval. It regarded the question retrieval task as a statistical machine translation problem by using IBM model-1 [8] to learn the word-to-word translation probabilities [1], [3], [9]. Some researchers improved the TRLM by considering the latent topic information or the category information [10]- [14].…”

Section: Introductionmentioning

confidence: 99%

Improving Question Retrieval in cQA Services Using a Dependency Parser

Bae

2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThe translation based language model (TRLM) is state-ofthe-art method to solve the lexical gap problem of the question retrieval in the community-based question answering (cQA). Some researchers tried to find methods for solving the lexical gap and improving the TRLM. In this paper, we propose a new dependency based model (DM) for the question retrieval. We explore how to utilize the results of a dependency parser for cQA. Dependency bigrams are extracted from the dependency parser and the language model is transformed using the dependency bigrams as bigram features. As a result, we obtain the significant improved performances when TRLM and DM approaches are effectively combined. key words: question retrieval, cQA service, dependency, language model

show abstract

“…the query is thought of as derived from the document in the same sort of way that in speech the heard sounds are generated from a word string (Berger andLafferty 1999, Miller et al 1999).…”

Section: Model Characteristicsmentioning

confidence: 99%

Retrieval System Models: What’s New?

Robertson¹,

Jones²

Monographs in Computer Science

View full text Add to dashboard Cite

In the postwar development of computing, most people thought of computers as machines for numerical applications. But some saw the potential for automatic text processing tasks, notably translation and document indexing and searching, even though words seemed much messier as data than numbers. For Roger, as one of these early researchers, building systems for language processing was both intellectually challenging and practically useful, and in the late 1950s he began to work on document retrieval (Needham 1963). The specialised scientific literature was growing too fast for the existing broadly-based and rigid indexing and classification schemes. This lack of appropriate retrieval tools, and the opportunities offered by computers, stimulated a critical examination of existing approaches to indexing and searching and the introduction of radically new ones. Document (or text) retrieval systems, like libraries before them, depend on a model of the way documents should be characterised to facilitate searching, and of effective strategies for searching. Many models for retrieval systems have been proposed since the 1950s. The most innovative, attractive, and successful have been those that, unlike the earlier library models, have exploited the behaviour of the actual words used in document texts, and have facilitated flexible matching between queries and documents, leading to a ranked search output. These ground features of modern systems fit automation very well, and automation has made it possible to take advantage of the distribution of terms in documents to allow, e.g. term weighting. There are, however, different ways of modelling retrieval systems within this broad framework, and it has not been possible, until recently, to provide concrete evidence for the real value and relative merits of the competing models. It has been impracticable to conduct the necessary large-scale retrieval experiments, because performance evaluation depends on having information about which documents are relevant to a query, and getting this information is extremely expensive.This situation has changed in a number of ways. The development of the Web and the proliferation of machine-readable text (in the broadest sense) have made the 'information layer' and its operations much more central to computing in general than they were in the 50s. 'Retrieval' is now taken to encompass a wide range of different tasks. Probably as a consequence, seriously more resources have over the last decade or two become available for 1

show abstract

Information Retrieval as Statistical Translation

Cited by 183 publications

References 9 publications

Combining Approaches to Information Retrieval

Combining Approaches to Information Retrieval

Improving Question Retrieval in cQA Services Using a Dependency Parser

Retrieval System Models: What’s New?

Contact Info

Product

Resources

About