A probabilistic justification for using tf×idf term weighting in information retrieval

Hiemstra, Djoerd

doi:10.1007/s007999900025

Cited by 158 publications

(82 citation statements)

References 13 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This notion is for example expressed in the popular tf/idf family of formulae but is also implicit in the language modelling framework [10]. The same method can be applied to the video retrieval setting, in which each shared video corresponds to a distinct d. We assume a unigram collection model LM C comprised of all comments in C and dedicated document models LM d based on the comment thread of document d. Subsequently, we assume good descriptors of d can be determined by the termwise KL-divergence between both models (LM C and LM d ), identifying locally densely occurring terms w (those that display a high negative value of KL(w)).…”

Section: Related Workmentioning

confidence: 99%

Exploiting User Comments for Audio-Visual Content Indexing and Retrieval

Eickhoff

Vries

2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. This work makes use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopaedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates significant improvements in retrieval performance.

show abstract

Section: Related Workmentioning

confidence: 99%

Exploiting User Comments for Audio-Visual Content Indexing and Retrieval

Eickhoff

Vries

2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…One common scoring method that has been used for visual place recognition is known as TF-IDF (Term Frequency -Inverse Document Frequency (Hiemstra, 2000;Manning et al, 2008)), which creates vectors for each location where each element is the ratio between how common a word is within that location and how common the word is within the entire set of locations (Sivic and Zisserman, 2003). Locations can then be compared by finding the distance between their corresponding TF-IDF vectors.…”

Section: Related Workmentioning

confidence: 99%

Building Location Models for Visual Place Recognition

Stumm

Mei

Lacroix

2015

The International Journal of Robotics Research

View full text Add to dashboard Cite

show abstract

“…1, "Toy Story 3." In this third possibility, the trending topic is associated with a "Promoted" tweet -a hybrid tweet-advertisement which is displayed at the top of search results on relevant topics 5 . While the classification of a trending topic as consisting of spikes or chatter is helpful for the understanding of the nature of trending topics, it is not directly useful in the identification or classification of terms as trending topics.…”

Section: Problem Definitionmentioning

confidence: 99%

“…Put simply, the weight of a document will be higher if the number of times a word occurs in a document is higher, or if the number of documents containing that word is lower; similarly, the weight of a document will be lower if the number of times a word occurs in a document is lower, or if the number of documents containing that word is higher [5].…”

Section: B Tf-idfmentioning

confidence: 99%

Streaming trend detection in Twitter

Benhardus¹,

Kalita²

2013

IJWBC

132

View full text Add to dashboard Cite

Abstract-Twitter is a popular microblogging and social networking service with over 100 million users. Users create short messages pertaining to a wide variety of topics. Certain topics are highlighted by Twitter as the most popular and are known as "trending topics." In this paper, we will outline methodologies of detecting and identifying trending topics from streaming data. Data from Twitter's streaming API will be collected and put into documents of equal duration. Data collection procedures will allow for analysis over multiple timespans, including those not currently associated with Twitter-identified trending topics. Term frequency-inverse document frequency analysis and relative normalized term frequency analysis are performed on the documents to identify the trending topics. Relative normalized term frequency analysis identifies unigrams, bigrams, and trigrams as trending topics, while term frequcny-inverse document frequency analysis identifies unigrams as trending topics.

show abstract

A probabilistic justification for using tf×idf term weighting in information retrieval

Cited by 158 publications

References 13 publications

Exploiting User Comments for Audio-Visual Content Indexing and Retrieval

Exploiting User Comments for Audio-Visual Content Indexing and Retrieval

Building Location Models for Visual Place Recognition

Streaming trend detection in Twitter

Contact Info

Product

Resources

About