Interactive text retrieval based on document similarities

Klose, Aljoscha; Nürnberger, Andreas; Kruse, Rudolf; Hartmann, G. K.; Richards, Michael

doi:10.1016/s1464-1895(00)00100-9

Cited by 25 publications

(14 citation statements)

References 7 publications

(7 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Empirical results showing that term entropy is good for index term selection can be found in [68]. Thus, we use term entropy as a term weighting method for highlighting appropriate terms in representing a time partition.…”

Section: Temporal Entropymentioning

confidence: 99%

Time-aware approaches to information retrieval

Kanhabua

2012

SIGIR Forum

View full text Add to dashboard Cite

In this thesis, we address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query.Our contributions in this thesis are different time-aware approaches within three topics in IR: content analysis, query analysis, and retrieval and ranking models. In particular, we aim at improving the retrieval effectiveness by 1) analyzing the contents of temporal document collections, 2) performing an analysis of temporal queries, and 3) explicitly modeling the time dimension into retrieval and ranking.Leveraging the time dimension in ranking can improve the retrieval effectiveness if information about the creation or publication time of documents is available. In this thesis, we analyze the contents of documents in order to determine the time of non-timestamped documents using temporal language models. We subsequently employ the temporal language models for determining the time of implicit temporal queries, and the determined time is used for re-ranking search results in order to improve the retrieval effectiveness.We study the effect of terminology changes over time and propose an approach to handling terminology changes using time-based synonyms. In addition, we propose different methods for predicting the effectiveness of temporal queries, so that a particular query enhancement technique can be performed to improve the overall performance. When the time dimension is incorporated into ranking, documents will be ranked according to both textual and temporal similarity. In this case, time uncertainty should also be taken into account. Thus, we propose a ranking model that considers the time uncertainty, and improve ranking by combining multiple features using learning-to-rank techniques.Through extensive evaluation, we show that our proposed time-aware approaches outperform traditional retrieval methods and improve the retrieval effectiveness in searching temporal document collections.

show abstract

Section: Temporal Entropymentioning

confidence: 99%

Time-aware approaches to information retrieval

Kanhabua

2012

SIGIR Forum

View full text Add to dashboard Cite

show abstract

“…Well known examples are Vivisimo [42] and Grokker [7] Although their underlying clustering logics are not fully disclosed, documents and evidences imply that these systems share many features with such known research systems as the Grouper system [21,2] and Lingo/Carrot Search [12,15]. Recently a new algorithm to classify search results was proposed.…”

Section: Crm and Document Clusteringmentioning

confidence: 99%

Designing evolving user profile in e-CRM with dynamic clustering of Web documents

Mahdavi

Cho

Shirazi

et al. 2008

Data & Knowledge Engineering

View full text Add to dashboard Cite

“…A simple but very efficient method in this direction is to extract keywords based on their entropy. For instance, in the approach discussed in [18], for each word k in the vocabulary the entropy as defined by [22] was computed:…”

Section: Methodsmentioning

confidence: 99%

“…That is, of two words occurring equally often the one with the higher entropy is preferred. Empirically this procedure has proven to yield a set of relevant words that are suited to serve as index terms [18]. In order to obtain a fixed number of terms that cover the document collection well, we applied a greedy strategy: from an arbitrary document in the collection select the term with the highest relative entropy as an index term.…”

Section: Methodsmentioning

confidence: 99%

Fuzzy Learning Vector Quantization with Size and Shape Parameters

Borgelt

Nürnberger

Kruse

The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05.

View full text Add to dashboard Cite

Abstract-We study an extension of fuzzy learning vector quantization that draws on ideas from the more sophisticated approaches to fuzzy clustering, enabling us to find fuzzy clusters of ellipsoidal shape and differing size with a competitive learning scheme. This approach may be seen as a kind of online fuzzy clustering, which can have advantages w.r.t. the execution time of the clustering algorithm. We demonstrate the usefulness of our approach by applying it to document collections, which are, in general, difficult to cluster due to the high number of dimensions and the special distribution characteristics of the data.

show abstract

Interactive text retrieval based on document similarities

Cited by 25 publications

References 7 publications

Time-aware approaches to information retrieval

Time-aware approaches to information retrieval

Designing evolving user profile in e-CRM with dynamic clustering of Web documents

Fuzzy Learning Vector Quantization with Size and Shape Parameters

Contact Info

Product

Resources

About