In this paper we focus on Sentence retrieval which is similar to Document retrieval but with a smaller unit of retrieval. Using data pre-processing in document retrieval is generally considered useful. When it comes to sentence retrieval the situation is not that clear. In this paper we use − (term frequency -inverse sentence frequency) method for sentence retrieval. As pre-processing steps, we use stop word removal and language modeling techniques: stemming and lemmatization. We also experiment with different query lengths. The results show that data pre-processing with stemming and lemmatization is useful with sentences retrieval as it is with document retrieval. Lemmatization produces better results with longer queries, while stemming shows worse results with longer queries. For the experiment we used data of the Text Retrieval Conference (TREC) novelty tracks.
Abstract-Sentence retrieval consists of retrieving relevant sentences from a document base in response to a query. Question answering, novelty detection, summarization, opinion mining and information provenance make use of sentence retrieval. Most of the sentence retrieval methods are trivial adaptations of document retrieval methods. However some newer sentence retrieval methods based on the language modeling framework successfully use some kind of context of sentences. Unlike that there is no successful improvement of the TF-ISF method that takes into account the context of sentences. In this paper we propose a recursive TF-ISF based method that takes into account the local context of a sentence. The context is considered the previous and next sentence of current sentence. We compared the new method to the TF-ISF baseline and to an earlier unsuccessful method that also incorporates a similar context into TF-ISF. We got statistically significant improvements of the results in comparison to both of the methods. Additional benefit of our method is the clear explicit model of the context that will allow us to automatically generate a document representation with context suitable for sentence retrieval which is important for our future work.
In this paper we combine our previous research in the field of Semantic web, especially ontology learning and population with Sentence retrieval. To do this we developed a new approach to sentence retrieval modifying our previous TF-ISF method which uses local context information to take into account only document level information. This is quite a new approach to sentence retrieval, presented for the first time in this paper and also compared to the existing methods that use information from whole document collection. Using this approach and developed methods for sentence retrieval on a document level it is possible to assess the relevance of a sentence by using only the information from the retrieved sentence's document and to define a document level OWL representation for sentence retrieval that can be automatically populated. In this way the idea of Semantic Web through automatic and semi-automatic extraction of additional information from existing web resources is supported. Additional information is formatted in OWL document containing document sentence relevance for sentence retrieval.
Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.