The role of named entities in web people search

Artiles, Javier; Amigó, Enrique; Gonzalo, Julio

doi:10.3115/1699571.1699582

Cited by 32 publications

(25 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…it focuses on the area where both clusters are closets to each other). A drawback of this method is that clusters may be merged due to single noisy elements being close to each other, but in practice it seems to be the best choice in problems related to ours [23,28,5]. …”

Section: Learning a Similarity Functionmentioning

confidence: 99%

“…Following the methodology proposed in [5] for a different clustering problem, we model the problem as a binary classification task: given a pair of tweets d1, d2 , the system must decide whether the tweets belong to the same topic (true) or not (false). Each pair of tweets is represented as a set of features (for instance, term overlapping between both tweets), which are used to feed a machine learning algorithm that learns a similarity function.…”

Section: Modeling Similarity As a Classification Taskmentioning

confidence: 99%

See 1 more Smart Citation

Learning similarity functions for topic detection in online reputation monitoring

Spina

Gonzalo

Amigó

2014

Proceedings of the 37th International ACM SIGIR Conference on Research &Amp; Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

Reputation management experts have to monitor-among others-Twitter constantly and decide, at any given time, what is being said about the entity of interest (a company, organization, personality. . . ). Solving this reputation monitoring problem automatically as a topic detection task is both essential-manual processing of data is either costly or prohibitive-and challenging-topics of interest for reputation monitoring are usually fine-grained and suffer from data sparsity.We focus on a solution for the problem that (i) learns a pairwise tweet similarity function from previously annotated data, using all kinds of content-based and Twitterbased features; (ii) applies a clustering algorithm on the previously learned similarity function. Our experiments indicate that (i) Twitter signals can be used to improve the topic detection process with respect to using content signals only; (ii) learning a similarity function is a flexible and efficient way of introducing supervision in the topic detection clustering process. The performance of our best system is substantially better than state-of-the-art approaches and gets close to the inter-annotator agreement rate. A detailed qualitative inspection of the data further reveals two types of topics detected by reputation experts: reputation alerts / issues (which usually spike in time) and organizational topics (which are usually stable across time).

show abstract

Section: Learning a Similarity Functionmentioning

confidence: 99%

Section: Modeling Similarity As a Classification Taskmentioning

confidence: 99%

Learning similarity functions for topic detection in online reputation monitoring

Spina

Gonzalo

Amigó

2014

Proceedings of the 37th International ACM SIGIR Conference on Research &Amp; Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

show abstract

“…As named entity recognition (NER) is used in most approaches, Artiles et al investigated which document features contribute to person name disambiguation and reported that NER only makes a small contribution [4].…”

Section: Related Work and Discussionmentioning

confidence: 99%

How do humans distinguish different people with identical names on the web?

Murakami

Miyake²

2012

Proceedings of the 21st ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

This research investigates how humans distinguish different people with identical names on the web to improve web people search. We asked subjects to classify 20 pages of web peoplesearch results for each of 20 person names and analyzed their decision processes through questionnaire, protocol analysis, and interview. We found that keywords, vocations, works (for a real person, works are those made by the individual and, for a fictional person, works are those in which the individual appears), facial images, and the names of related people are important for distinguishing individuals. We proposed a model for distinguishing individuals and a knowledge-structure model based on the experiment's results.

show abstract

“…Two evaluation metrics are employed during the unsupervised evaluation in order to estimate the quality of the clustering solutions, the V-measure [24] and the paired F-Score [25]. V-Measure assesses the quality of a clustering by measuring its homogeneity (h) and its completeness (c).…”

Section: Evaluation Measuresmentioning

confidence: 99%

“…In the paired F-Score [25] evaluation, the clustering problem is transformed into a classification problem [4]. A set of instance pairs is generated from the automatically induced clusters (F(K)), which comprises pairs of the instances found in each cluster.…”

Section: Evaluation Measuresmentioning

confidence: 99%

A Quantitative Evaluation of Global Word Sense Induction

Apidianaki

Cruys

2011

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

Abstract. Word sense induction (WSI) is the task aimed at automatically identifying the senses of words in texts, without the need for handcrafted resources or annotated data. Up till now, most WSI algorithms extract the different senses of a word 'locally' on a per-word basis, i.e. the different senses for each word are determined separately. In this paper, we compare the performance of such algorithms to an algorithm that uses a 'global' approach, i.e. the different senses of a particular word are determined by comparing them to, and demarcating them from, the senses of other words in a full-blown word space model. We adopt the evaluation framework proposed in the SemEval-2010 Word Sense Induction & Disambiguation task. All systems that participated in this task use a local scheme for determining the different senses of a word. We compare their results to the ones obtained by the global approach, and discuss the advantages and weaknesses of both approaches.

show abstract

The role of named entities in web people search

Cited by 32 publications

References 19 publications

Learning similarity functions for topic detection in online reputation monitoring

Learning similarity functions for topic detection in online reputation monitoring

How do humans distinguish different people with identical names on the web?

A Quantitative Evaluation of Global Word Sense Induction

Contact Info

Product

Resources

About