Temporal Query Expansion Using a Continuous Hidden Markov Model

Rao, Jinfeng; Lin, Jimmy

doi:10.1145/2970398.2970424

Cited by 9 publications

(13 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Significance test-ing was conducted with other related methods for comparison but [14], [25] (omitted due to the unavailability of results file and the limitation of reproducing accurate results). For allrel criteria, significant differences were observed in our method compared to [7], [8], and baseline in terms of P@30 and MAP. We obtained a competitive performance in NDCG@30, although our result is statistically indistin- guishable with related methods [8], [49].…”

Section: Comparison With Related Workmentioning

confidence: 91%

“…Amodeo et al [23] detected bursts for timed query expansion using Rocchio's pseudo relevance feedback. More recently, Rao et al [7] utilized the continuous hidden markov model (cHMM) to identify documents that occur in bursty temporal clusters.…”

Section: Burst-aware Score (Bs)mentioning

confidence: 99%

“…Significance testing was conducted [3] 0.4830 † 0.2741 † ---Metzler et al [4] 0.4551 † 0.2210 † 0.4922 † 0.1434 † 0.1582 † Amati et al [6] 0.4401 † 0.2318 † 0.5086 † 0.1495 † 0.2048 † Rao et al [7] 0.4388 † 0.4024 † ---Liang et al [5] 0 with other related methods for comparison but [14] (omitted due to the unavailability of results file and the limitation of reproducing accurate results). For allrel criteria, significant differences were observed in our method compared to [3]- [7], and baseline in terms of all evaluation measures. For highrel criteria, significant differences were observed in [4], [6] and baseline in terms of P@30 and in [4]- [6] and baseline in terms of MAP.…”

Section: Comparison With Related Workmentioning

confidence: 99%

“…Moreover, we utilized a rich set of account related, twitter-specific, and popularity-based features to quantify the document quality. Liang et al [5] proposed several temporal features for tweet re-ranking and Rao et al [7] utilized the continuous hidden markov model (cHMM) to identify documents for query expansion that occur in bursty temporal clusters. However, they didn't estimate the temporal dimension of the query, though some queries are temporally insensitive.…”

Section: Comparison With Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature Selection

Chy

Ullah

Aono

2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Abu Nowshed CHY †a) , Md Zia ULLAH †b) , Nonmembers, and Masaki AONO †c) , Member SUMMARYMicroblog, especially twitter, has become an integral part of our daily life for searching latest news and events information. Due to the short length characteristics of tweets and frequent use of unconventional abbreviations, content-relevance based search cannot satisfy user's information need. Recent research has shown that considering temporal and contextual aspects in this regard has improved the retrieval performance significantly. In this paper, we focus on microblog retrieval, emphasizing the alleviation of the vocabulary mismatch, and the leverage of the temporal (e.g., recency and burst nature) and contextual characteristics of tweets. To address the temporal and contextual aspect of tweets, we propose new features based on query-tweet time, word embedding, and query-tweet sentiment correlation. We also introduce some popularity features to estimate the importance of a tweet. A three-stage query expansion technique is applied to improve the relevancy of tweets. Moreover, to determine the temporal and sentiment sensitivity of a query, we introduce query type determination techniques. After supervised feature selection, we apply random forest as a feature ranking method to estimate the importance of selected features. Then, we make use of ensemble of learning to rank (L2R) framework to estimate the relevance of query-tweet pair. We conducted experiments on TREC Microblog 2011 and 2012 test collections over the TREC Tweets2011 corpus. Experimental results demonstrate the effectiveness of our method over the baseline and known related works in terms of precision at 30 (P@30), mean average precision (MAP), normalized discounted cumulative gain at 30 (NDCG@30), and R-precision (R-Prec) metrics. key words: microblog search, temporal information retrieval, query expansion, feature selection, learning to rank, time-aware ranking IntroductionNowadays, microblog web sites are not only the places in maintaining the social relationships, but also act as a valuable information source. Everyday lots of users turn into microblog sites for sharing their views, opinions, experiences, important news, and also want to get some information what is happening around the world. Among several microblog sites, Twitter * is now the most popular, where lots of users post tweets whenever a notable event occurs. That is why; information retrieval in twitter has made a hit with a lot of complaisance. By searching tweets, users find temporally relevant information, such as breaking news and real-time events [1]. That means, freshness (i.e. recency) of the tweet with respect to query time is an important factor of rele- vance. Another important characteristic of twitter is that people tends to post about a topic within a specific period of time (i.e. bursty nature). For example, when the breakup news of famous band "White Stripes" published on 2nd Feb, 2011, many people post tweets about this topic on that day. That is why; posts that are generate...

show abstract

Section: Comparison With Related Workmentioning

confidence: 91%

Section: Burst-aware Score (Bs)mentioning

confidence: 99%

Section: Comparison With Related Workmentioning

confidence: 99%

Section: Comparison With Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature Selection

Chy

Ullah

Aono

2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…There is a long thread of research utilizing the query expansion (QE) to mitigate the vocabulary mismatch problem in microblog retrieval [4], [5], [6], [7], [8]. Most of these methods are based on the pseudo-relevance feedback (PRF) and select the terms from the top retrieved tweets as PRF assumes the top retrieved tweets are relevant.…”

Section: Introductionmentioning

confidence: 99%

Query Expansion for Microblog Retrieval Focusing on an Ensemble of Features

Chy

Ullah

Aono

2019

Journal of Information Processing

View full text Add to dashboard Cite

In microblog search, vocabulary mismatch is a persisting problem due to the brevity of tweets and frequent use of unconventional abbreviations. One way of alleviating this problem is to reformulate the query via query expansion. However, finding good expansion terms for a given query is a challenging task. In this paper, we present a query expansion framework, where supervised learning is adopted for selecting expansion terms. Upon retrieving tweets by our proposed topic modeling based query expansion, we utilize the pseudo-relevance feedback and a new temporal relatedness approach to select the candidate tweets. Next, we devise several new features to select the temporally and semantically relevant expansion terms by leveraging the temporal, word embedding, and sentiment association of candidate term and query. Moreover, we also utilize the lexical and twitter specific features to quantify the term relatedness. After supervised feature selection using regularized regression, we estimate the feature importance by applying random forest. Then, we make use of a learning-to-rank (L2R) framework to rank the candidate expansion terms. Results of extensive experiments on TREC Microblog 2011 and 2012 test collections over the Tweets2011 corpus show that our proposed method outperforms the baseline and competitive query expansion methods.

show abstract

Time segment language model for microblog retrieval

Han

Kong

2021

Neural Comput & Applic

View full text Add to dashboard Cite

Related studies have shown that the time characteristics of microblog can improve retrieval performance. However, these researches mainly focus on the time distribution of tweets related to a given query. And this single time characteristics might not be sufficient to reflect time characteristics of microblog. Inspired by the recent success of time-based language models for microblog retrieval, this paper proposes a time segment language model (TSLM) to model the time characteristics of microblog. Briefly, TSLM constructs the language model of each time segment to model the probability distribution over sequences of words for each different time segment. Based on TSLM, the time distribution of terms (tDT), the time distribution of queries (tDQ) and the time distribution of documents (tDD) are proposed. Furthermore, TSLM is exploited to estimate the query model, the document model and compute the similarity between query and document. The experimental results on the Tweets2011 corpus show that the proposed approaches outperform several state-of-the-art baselines.

show abstract

Temporal Query Expansion Using a Continuous Hidden Markov Model

Cited by 9 publications

References 16 publications

Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature Selection

Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature Selection

Query Expansion for Microblog Retrieval Focusing on an Ensemble of Features

Time segment language model for microblog retrieval

Contact Info

Product

Resources

About