Elke Mittendorf scite author profile

Elke Mittendorf

4Publications

63Citation Statements Received

21Citation Statements Given

How they've been cited

How they cite others

Affiliations

École Polytechnique Fédérale de Lausanne

Publications

Order By: Most citations

Document and Passage Retrieval Based on Hidden Markov Models

Mittendorf¹,

Schäuble²

1994

View full text Add to dashboard Cite

Introduced is a new approach to Information Retrieval developed on the basis of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents-documents with predefined boundaries and also entities of information that are of arbitrary lengths and formats (passage retrieval). Our retrieval model is shown to encompass promising capabilities: First, the position of occurrences of indexing features can be used for indexing. Positional information is essential, for instance, when considering phrases, negation, and the proximity of features. Second, from training collections we can derive automatically optimal weights for arbitrary features. Third, a query dependent structure can be determined for every document by segmenting the documents into passages that are either relevant or irrelevant to the query. The theoretical analysis of our retrieval model is complemented by the results of preliminary experiments. Introd uctionWe introduce a new approach to Information Retrieval, i.e. document retrieval and passage retrieval. Documents are considered as being produced by stochastic processes. A first stochastic process generates text fragments that are relevant to a certain query. A second stochastic process generates text fragments independent of any particular query. The generation of text fragments by the two stochastic processes is modeled by means of two Hidden Markov Models (HMMs). Whether one of these two HMMs generates a text fragment with a high probability or with a low probability depends on the distribution of the query features within the text fragment.In the case of document retrieval, the documents are assigned scores that depend on the ratio of the probability that the document was generated by the first stochastic process and the probability that the document was generated by the second stochastic process. As usual, the documents are presented to the user in decreasing order of their scores. In the case of passage retrieval, the score of a passage depends on the probability that the passage itself was generated by the first stochastic process and the text fragments before and after the passage are generated by the second stochastic process.There are three problems, that are considered difficult in Information Retrieval. First, it is not well known how complex features (e.g. phrases, proximity data, negations, cooccurrence and cocitation data etc.) should be used for i~dexing. Second, we lack a general weighting scheme for arbitrary indexing features and for arbitrary document collections. Third, the optimal segmentation of a long document into segments that are either relevant or irrelevant to a query is another open problem. Our approach encompasses promising capabilities to solve these three problems at least partially. First, information about positions of features can be conserved, because in our approach a document is considered as being produced by a stochastic process. Conventional retrieval models do not take into account the positions where...

show abstract

Applying probabilistic term weighting to OCR text in the case of a large alphabetic library catalogue

Mittendorf

Schäuble

Sheridan

1995

View full text Add to dashboard Cite

Untitled

Mittendorf¹,

Schäuble

2000

View full text Add to dashboard Cite

Data corruption and information retrieval

Mittendorf¹

1998

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Elke Mittendorf

Document and Passage Retrieval Based on Hidden Markov Models

Applying probabilistic term weighting to OCR text in the case of a large alphabetic library catalogue

Untitled

Data corruption and information retrieval

Contact Info

Product

Resources

About