2012
DOI: 10.1007/978-3-642-28997-2_13
|View full text |Cite
|
Sign up to set email alerts
|

Result Disambiguation in Web People Search

Abstract: We study the problem of disambiguating the results of a web people search engine: given a query consisting of a person name plus the result pages for this query, find correct referents for all mentions by clustering the pages according to the different people sharing the name. While the problem has been studied extensively, we discover that the increasing availability of results retrieved from social media platforms causes state-of-the-art methods to break down. We analyze the problem and propose a dual strate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…The most popular types of features were: bag of words, named entities (NEs), and noun phrases. These features were usually weighted using the well‐known term frequency‐inverse document frequency (TF‐IDF) scheme (Berendsen et al, ; Chen & Martin, ; Liu, Lu, & Xu, ; Yoshida, Ikeda, Ono, Sato, & Nakagawa, ). Balog et al () to compare a VSM representation with respect to using probabilistic Latent Semantic Indexing (pLSI), a topic model representation, showing that the first option reaches significantly better results.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The most popular types of features were: bag of words, named entities (NEs), and noun phrases. These features were usually weighted using the well‐known term frequency‐inverse document frequency (TF‐IDF) scheme (Berendsen et al, ; Chen & Martin, ; Liu, Lu, & Xu, ; Yoshida, Ikeda, Ono, Sato, & Nakagawa, ). Balog et al () to compare a VSM representation with respect to using probabilistic Latent Semantic Indexing (pLSI), a topic model representation, showing that the first option reaches significantly better results.…”
Section: Related Workmentioning
confidence: 99%
“…Regarding the corpora for evaluating PND in a web search scenario, in addition to the WePS datasets that we use in the experiment and are described in the Results and Discussion section, we can mention the corpus presented by Berendsen et al (), which was used to study the impact of social network webpages in this problem. However, the rankings contained in this corpus were not made up of the results of a query to a search engine, but the authors selected search results from several search engines to include mostly webpages from social platforms, which leads to a corpus that does not reflect a real query output.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For each of the methods, we tested many different configurations. For example, for fairness reasons, one of the configurations of K-Means used the Wikipedia entities as initial centroids similarly as proposed in [8]. However, random seeds 7 led to better results than the above 6 http://www.cs.waikato.ac.nz/ml/weka/ 7 We used the average results over ten repetitions.…”
Section: Comparison With Clustering Techniquesmentioning
confidence: 99%
“…This representation allows users to effectively zoom in and locate the documents of interest. It has been proved to facilitate the searching and browsing process [1]. This paper borrows Community Mining from Social Network Analysis to discover different topical coherent document groups and gives each document cluster descriptive labels.…”
Section: Introductionmentioning
confidence: 99%