Abstract. This paper discusses our participation in INEX using the TIJAH XML-IR system. We have enriched the TIJAH system, which follows a standard layered database architecture, with several new features. An extensible conceptual level processing unit has been added to the system. The algebra on the logical level and the implementation on the physical level have been extended to support phrase search and structural relevance feedback. The conceptual processing unit is capable of rewriting NEXI content-only and content-and-structure queries into the internal form, based on the retrieval model parameter specification, that is either predefined or based on relevance feedback. Relevance feedback parameters are produced based on the data fusion of result element score values and sizes, and relevance assessments. The introduction of new operators supporting phrase search in score region algebra on the logical level is discussed in the paper, as well as their implementation on the physical level using the pre-post numbering scheme. The framework for structural relevance feedback is also explained in the paper. We conclude with a preliminary analysis of the system performance based on INEX 2004 runs.
This paper discusses our participation in INEX (the Initiative for the Evaluation of XML Retrieval) using the TIJAH XML-IR system. TIJAH's system design follows a 'standard' layered database architecture, carefully separating the conceptual, logical and physical levels. At the conceptual level, we classify the INEX XPath-based query expressions into three different query patterns. For each pattern, we present its mapping into a query execution strategy. The logical layer exploits score region algebra (SRA) as the basis for query processing. We discuss the region operators used to select and manipulate XML document components. The logical algebra expressions are mapped into efficient relational algebra expressions over a physical representation of the XML document collection using the 'pre-post numbering scheme'. The paper concludes with an analysis of experiments performed with the INEX test collection.
No abstract
C e n t r u m v o o r W i s k u n d e e n I n f o r m a t i c a INformation SystemsStructural features in content oriented XML retrieval G. Ramírez Structural features in content oriented XML retrieval ABSTRACT The structural features of XML components are an extra source of information that should be used in a content-oriented retrieval task on this type of documents. This paper explores three different structural features from the INEX collection that could be used in content-oriented search. We analyse the gain this knowledge could add to the performance of an information retrieval system, and present a first approach on how this structural information could be extracted from a relevance feedback process to be used as priors in a language modelling framework. that could be used in content-oriented search. We analyse the gain this knowledge could add to the performance of an information retrieval system, and present a first approach on how this structural information could be extracted from a relevance feedback process to be used as priors in a language modelling framework. IntroductionContent-oriented XML retrieval differs from traditional document retrieval, not only in that the retrieval system has to decide which is the most appropriate unit to return to the user, but also because the document contains extra information on how its content is structured. The implicit semantics on how and why the documents are organised in a certain way, might help the information system to retrieve the most relevant information for a user need. The usage of this structural knowledge might not only help to decide what is the best retrieval unit given a query, but it may also help to improve the effectiveness of the content oriented search. The area has been studied for a number of years now, and an XML retrieval system benchmark (INEX) has been organised in the last three years [FGKL02]. However, so far the structural information in documents has hardly been used, and most systems, including our own, have treated XML retrieval as traditional document retrieval. The main difference is that the retrieved units, traditionally documents, can be any element in the XML tree, ranging from paragraphs and sections to full articles or even complete journals. This paper analyses the information available in the structure of the documents and shows how this information can be useful. To this end, we analyse the relevance assessments for INEX 2004 [FGKL02] and compare the structural information available in the set of elements that has been judged relevant to the structural information in retrieved elements and in the collection in general. The differences in structural characteristics between relevant elements and other elements could be exploited to improve retrieval results. We do not go into much detail on how to obtain relevance information. That process does not differ from the one used in traditional content-based feedback.The paper is organised as follows. Section 2 gives an overview of work done in the area of contentoriented ...
Abstract. Retrieving information from heterogeneous data sources in a flexible manner and within a single (database) framework is still a challenge. In this paper we present several extensions of our prototype database system TIJAH developed for structured retrieval. The extensions are aimed at modeling vague selection of XML elements and image retrieval. All three levels (conceptual, logical, and physical) of the TIJAH system are enhanced to support the extensions. Additionally, we analyze different ways of removing overlap and explain how structural information can be used for relevance feedback.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.