Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval 2016
DOI: 10.1145/2911451.2914719
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing First Story Detection using Word Embeddings

Abstract: In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 24 publications
(26 citation statements)
references
References 10 publications
0
25
0
1
Order By: Relevance
“…This approach supports arbitrary similarity functions, but it suffers from the problem of the vocabulary mismatch [4,63,23]. The second approach involves carrying out the k-NN search via LSH [53,65,46,37]. It is most appropriate for the cosine similarity.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…This approach supports arbitrary similarity functions, but it suffers from the problem of the vocabulary mismatch [4,63,23]. The second approach involves carrying out the k-NN search via LSH [53,65,46,37]. It is most appropriate for the cosine similarity.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…demonstrated encouraging results using the word2vec Skip-gram model to generate event timelines from tweets. Moran et al (2016) achieved an improvement over the state-of-the-art first story detection (FSD) results by expanding the tweets with their semantically related terms using word2vec.…”
Section: Neural Embeddingsmentioning
confidence: 99%
“…They are only related in that event detection must be performed as a part of finding first stories. Moreover, while [19] focused on detecting first story with paraphrasing, our concern is on identifying and classifying news events that are worth presenting to the user.…”
Section: Related Workmentioning
confidence: 99%