Overview of the INEX 2008 Ad Hoc Track

Kamps, Jaap; Geva, Shlomo; Trotman, Andrew; Woodley, Alan; Koolen, Marijn

doi:10.1007/978-3-642-03761-0_1

Cited by 34 publications

(34 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That is, focused retrieval based on I full yields an iP figure of 0.643 at 1% recall level. The best official result in INEX 2008 for this task is 0.689 and our result is within the top-10 results of this task (see Table 6 in [15]). This is also the case for RiC and BiC results that will be discussed in the upcoming sections, proving that we have a reasonable baseline for drawing conclusions in our experimental framework.…”

Section: Performance Comparison Of Indexing Strategies: Focused Tasksupporting

confidence: 77%

“…In this study, we concentrate on three ad-hoc retrieval tasks, namely, Focused, Relevant-in-Context (RiC) and Best-in-Context (BiC), as described in recent INEX campaigns (e.g., see [13,15]). In short, the Focused task is designed to retrieve the most focused results for a query without returning overlapping elements.…”

Section: Methodsmentioning

confidence: 99%

“…However, while ranking the documents, if the article node of the document is within these retrieved elements, we use its score as the document score. Otherwise, we use the score of the highest scoring retrieved element as the score of [15]. Note that, this suggests starting to read each ranked document from the beginning.…”

Section: Performance Comparison Of Indexing Strategies: Best-in-contementioning

confidence: 99%

“…Note that, this suggests starting to read each ranked document from the beginning. For our work, we also experimented with providing the offset of the highest scoring element per document as BEP [15], which yielded inferior results to the former approach. Thus, we only report the results where BEP is set to 1.…”

Section: Performance Comparison Of Indexing Strategies: Best-in-contementioning

confidence: 99%

See 3 more Smart Citations

XML Retrieval Using Pruned Element-Index Files

Altıngövde

Atilgan

Ulusoy

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. An element-index is a crucial mechanism for supporting content-only (CO) queries over XML collections. A full element-index that indexes each element along with the content of its descendants involves a high redundancy and reduces query processing efficiency. A direct index, on the other hand, only indexes the content that is directly under each element and disregards the descendants. This results in a smaller index, but possibly in return to some reduction in system effectiveness. In this paper, we propose using static index pruning techniques for obtaining more compact index files that can still result in comparable retrieval performance to that of a full index. We also compare the retrieval performance of these pruning based approaches to some other strategies that make use of a direct element-index. Our experiments conducted along with the lines of INEX evaluation framework reveal that pruned index files yield comparable to or even better retrieval performance than the full index and direct index, for several tasks in the ad hoc track.

show abstract

Section: Performance Comparison Of Indexing Strategies: Focused Tasksupporting

confidence: 77%

Section: Methodsmentioning

confidence: 99%

Section: Performance Comparison Of Indexing Strategies: Best-in-contementioning

confidence: 99%

Section: Performance Comparison Of Indexing Strategies: Best-in-contementioning

confidence: 99%

See 2 more Smart Citations

XML Retrieval Using Pruned Element-Index Files

Altıngövde

Atilgan

Ulusoy

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Since INEX 2007 [1] arbitrary passages are also permitted as retrievable units, besides XML elements. A retrieved passage can be a sequence of textual content either from within an element or spanning a range of elements.…”

Section: Introductionmentioning

confidence: 99%

DCU and ISI@INEX 2010: Adhoc and Data-Centric Tracks

Ganguly

Leveling

Jones

et al. 2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We describe the participation of Dublin City University (DCU) and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specific collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives of INEX ad-hoc topics is shown to improve retrieval effectiveness.

show abstract

Evaluation effort, reliability and reusability in XML retrieval

Pal

Mitra

Kamps

2010

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

The Initiative for the Evaluation of XML retrieval (INEX) provides a TREC-like platform for evaluating contentoriented XML retrieval systems. Since 2007, INEX has been using a set of precision-recall based metrics for its ad hoc tasks. The authors investigate the reliability and robustness of these focused retrieval measures, and of the INEX pooling method. They explore four specific questions: How reliable are the metrics when assessments are incomplete, or when query sets are small? What is the minimum pool/query-set size that can be used to reliably evaluate systems? Can the INEX collections be used to fairly evaluate "new" systems that did not participate in the pooling process? And, for a fixed amount of assessment effort, would this effort be better spent in thoroughly judging a few queries, or in judging many queries relatively superficially? The authors' findings validate properties of precision-recall-based metrics observed in document retrieval settings. Early precision measures are found to be more error-prone and less stable under incomplete judgments and small topic-set sizes. They also find that system rankings remain largely unaffected even when assessment effort is substantially (but systematically) reduced, and confirm that the INEX collections remain usable when evaluating nonparticipating systems. Finally, they observe that for a fixed amount of effort, judging shallow pools for many queries is better than judging deep pools for a smaller set of queries. However, when judging only a random sample of a pool, it is better to completely judge fewer topics than to partially judge many topics. This result confirms the effectiveness of pooling methods. IntroductionContent-oriented XML 1 retrieval is a domain of information retrieval (IR) that has been receiving increasing attention in recent years. The widespread use of eXtensible Markup Language (XML) as a standard document format on the Web and in digital libraries has led to the continuous growth of XML information repositories. This growth has been matched by increasing efforts in the development of XML IR systems that support content-oriented XML retrieval. Besides the content, these systems also exploit structural information, both syntactic and semantic, provided by the XML markup, to return document components or XML elements instead of whole documents in response to a user query. This type of focused retrieval is particularly useful when dealing with collections of long documents or documents covering a wide variety of topics (e.g., books, user manuals, legal documents) because the effort required from users to locate relevant content can be reduced by directing them to the most relevant document components. As the number of XML retrieval systems increases, so does the need to evaluate their effectiveness .The Initiative for the Evaluation of XML retrieval (INEX; 2009), set up in 2002, has been responsible for creating a Cranfield-style infrastructure for evaluating the effectiveness of content-oriented XML IR systems. INEX provides large t...

show abstract

Overview of the INEX 2008 Ad Hoc Track

Cited by 34 publications

References 7 publications

XML Retrieval Using Pruned Element-Index Files

XML Retrieval Using Pruned Element-Index Files

DCU and ISI@INEX 2010: Adhoc and Data-Centric Tracks

Evaluation effort, reliability and reusability in XML retrieval

Contact Info

Product

Resources

About