Peter Schäuble scite author profile

Peter Schäuble

41Publications

275Citation Statements Received

60Citation Statements Given

How they've been cited

618

275

How they cite others

290

Affiliations

Eurospider Information Technology (Switzerland), École Polytechnique Fédérale de Lausanne, ETH Zurich

Publications

Order By: Most citations

Document and Passage Retrieval Based on Hidden Markov Models

Mittendorf¹,

Schäuble²

1994

View full text Add to dashboard Cite

Introduced is a new approach to Information Retrieval developed on the basis of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents-documents with predefined boundaries and also entities of information that are of arbitrary lengths and formats (passage retrieval). Our retrieval model is shown to encompass promising capabilities: First, the position of occurrences of indexing features can be used for indexing. Positional information is essential, for instance, when considering phrases, negation, and the proximity of features. Second, from training collections we can derive automatically optimal weights for arbitrary features. Third, a query dependent structure can be determined for every document by segmenting the documents into passages that are either relevant or irrelevant to the query. The theoretical analysis of our retrieval model is complemented by the results of preliminary experiments. Introd uctionWe introduce a new approach to Information Retrieval, i.e. document retrieval and passage retrieval. Documents are considered as being produced by stochastic processes. A first stochastic process generates text fragments that are relevant to a certain query. A second stochastic process generates text fragments independent of any particular query. The generation of text fragments by the two stochastic processes is modeled by means of two Hidden Markov Models (HMMs). Whether one of these two HMMs generates a text fragment with a high probability or with a low probability depends on the distribution of the query features within the text fragment.In the case of document retrieval, the documents are assigned scores that depend on the ratio of the probability that the document was generated by the first stochastic process and the probability that the document was generated by the second stochastic process. As usual, the documents are presented to the user in decreasing order of their scores. In the case of passage retrieval, the score of a passage depends on the probability that the passage itself was generated by the first stochastic process and the text fragments before and after the passage are generated by the second stochastic process.There are three problems, that are considered difficult in Information Retrieval. First, it is not well known how complex features (e.g. phrases, proximity data, negations, cooccurrence and cocitation data etc.) should be used for i~dexing. Second, we lack a general weighting scheme for arbitrary indexing features and for arbitrary document collections. Third, the optimal segmentation of a long document into segments that are either relevant or irrelevant to a query is another open problem. Our approach encompasses promising capabilities to solve these three problems at least partially. First, information about positions of features can be conserved, because in our approach a document is considered as being produced by a stochastic process. Conventional retrieval models do not take into account the positions where...

show abstract

A system for retrieving speech documents

Glavitsch

Schäuble

1992

View full text Add to dashboard Cite

New techniques for open-vocabulary spoken document retrieval

Wechsler

Munteanu

Schäuble

1998

View full text Add to dashboard Cite

This paper presents four novel techniques for open-vocabulary spoken document retrieval: a method to detect slots that possibly contain a query feature; a method to estimate occurrence probabilities; a technique that we call collection-wide probability re-estimation and a w eighting scheme which takes advantage of the fact that long query features are detected more reliably. These four techniques have been evaluated using the TREC-6 spoken document retrieval test collection to determine the improvements in retrieval e ectiveness with respect to a baseline retrieval method. Results show that the retrieval e ectiveness can be improved considerably despite the large number of speech recognition errors.

show abstract

Multilingual Information Retrieval Based on Document Alignment Techniques

Braschler

Schäuble

1998

View full text Add to dashboard Cite

Cross-language information retrieval in a Multilingual Legal Domain

Sheridan

Braschler

Schäuble

1997

View full text Add to dashboard Cite

Determining the effectiveness of retrieval algorithms

Frei

Schäuble

1991

Information Processing & Management

View full text Add to dashboard Cite

Metadata for integrating speech documents in a text retrieval system

Glavitsch¹,

Schäuble²,

Wechsler³

1994

SIGMOD Rec.

View full text Add to dashboard Cite

We present an information retrieval system that simultaneously allows to search for text and speech documents. The retrieval system accepts vague queries and performs a best-match search to find those documents that are relevant to the query. The output of the retrieval system is a list of ranked documents where the documents on the top of the list satisfy best the user's information need. The relevance of the documents is estimated by means of metadata (document description vectors). The metadata is automatically generated and it is organized such that queries can be processed efficiently. We introduce a controlled indexing vocabulary for both speech and text documents. The size of the new indexing vocabulary is small (1000 features) compared with the sizes of indexing vocabularies of conventional text retrieval (10000 -100000 features). We show that the retrieval effectiveness based on such a small indexing vocabulary is similar to the retrieval effectiveness of a Boolean retrieval system.

show abstract

Building a Large Multilingual Test Collection from Comparable News Documents

Sheridan

Ballerini

Schäuble

1998

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Peter Schäuble

Document and Passage Retrieval Based on Hidden Markov Models

A system for retrieving speech documents

New techniques for open-vocabulary spoken document retrieval

Multilingual Information Retrieval Based on Document Alignment Techniques

Cross-language information retrieval in a Multilingual Legal Domain

Determining the effectiveness of retrieval algorithms

Metadata for integrating speech documents in a text retrieval system

Building a Large Multilingual Test Collection from Comparable News Documents

Contact Info

Product

Resources

About