A testbed for people searching strategies in the WWW

Artiles, Javier; Gonzalo, Julio; Verdejo, Felisa

doi:10.1145/1076034.1076132

Cited by 43 publications

(46 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the goal of this work is aimed towards evaluating the person clustering hypothesis in a very general setting, we have selected these clustering methods because they are representative of the types already tried. For instance, Artiles et al [2] use a similar representation of documents with agglomerative clustering technique to obtain a baseline for a pilot test collection for this task. However, our work differs because we focus on exploring how document clustering performs for this task.…”

Section: Related Workmentioning

confidence: 99%

“…The harmonic mean (α = 0.5) was used for the final ranking of systems at SemEval, and F 0.2 was also reported as an additional measure, which gives more importance to the inverse purity aspect (α = 0.2). Artiles et al [2] argue that the rationale for using F 0.2 , from a user's point of view, is that "it is easier to discard a few incorrect web pages in a cluster which has all the information needed, than having to collect the relevant information across many different clusters." We decided to also report on F 0.8 , a measure which gives more importance to the purity aspect (α = 0.8).…”

Section: Performance Measuresmentioning

confidence: 99%

“…Given the popularity of people names in web queries, the problem of ambiguous person names is encountered frequently as a person name may have hundreds of distinct referents. Indeed, according to U.S. Census Bureau figures approximately 90,000 different names are shared by around 100 million people (as cited by [2]). On the web, a query for a common name often yields thousands of pages referring to different namesakes [9].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Resolving Person Names in Web People Search

Balog

Azzopardi

Rijke

2009

Weaving Services and People on the World Wide Web

View full text Add to dashboard Cite

Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambiguation as a document clustering problem, where it is assumed that the documents represent particular people. This leads to the person cluster hypothesis, which states that similar documents tend to represent the same person. Single Pass Clustering, k-Means Clustering, Agglomerative Clustering and Probabilistic Latent Semantic Analysis are employed and empirically evaluated in this context. On the SemEval 2007 Web People Search it is shown that the person cluster hypothesis holds reasonably well and that the Single Pass Clustering and Agglomerative Clustering methods provide the best performance.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Performance Measuresmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Resolving Person Names in Web People Search

Balog

Azzopardi

Rijke

2009

Weaving Services and People on the World Wide Web

View full text Add to dashboard Cite

show abstract

“…Many celebrities and experts from various fields are referred by their original names on web. Most of the queries to web search engines include person names [1] [2]. For example, people might use "Michel Jackson" as a query on search engine to know about him.…”

Section: Introduction 11 Information Retrievalmentioning

confidence: 99%

Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co-occurrences

B¹,

Jayabhaduri²

2012

IJCA

View full text Add to dashboard Cite

Many celebrities and experts from various fields may have been referred by not only their personal names but also by their aliases on web. Aliases are very important in information retrieval to retrieve complete information about a personal name from the web, as some of the web pages of the person may also be referred by his aliases. The aliases for a personal name are extracted by previously proposed alias extraction method. In information retrieval, the web search engine automatically expands the search query on a person name by tagging his aliases for complete information retrieval thereby improving recall in relation detection task and achieving a significant mean reciprocal rank (MRR) of search engine. For the further substantial improvement on recall and MRR from the previously proposed methods, our proposed method will order the aliases based on their associations with the name using the definition of anchor texts-based co-occurrences between name and aliases in order to help the search engine tag the aliases according to the order of associations. The association orders will automatically be discovered by creating an anchor texts-based co-occurrence graph between name and aliases. Ranking support vector machine (SVM) will be used to create connections between name and aliases in the graph by performing ranking on anchor texts-based co-occurrence measures. The hop distances between nodes in the graph will lead to have the associations between name and aliases. The hop distances will be found by mining the graph. The proposed method will outperform previously proposed methods, achieving substantial growth on recall and MRR.

show abstract

“…Around 30% of search engine queries include personal names [1]. However, retrieving information about a person merely using his or her real names is insufficient when that person has nicknames.…”

Section: Introductionmentioning

confidence: 99%

Automatically Extracting Personal Name Aliases from the Web

Bollegala

Honma

Matsuo

et al. 2008

Advances in Natural Language Processing

View full text Add to dashboard Cite

Abstract. Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

show abstract

A testbed for people searching strategies in the WWW

Abstract: This paper describes the creation of a testbed to evaluate people searching strategies on the World-Wide-Web. This task involves resolving person names' ambiguity and locating relevant information characterising every individual under the same name.

Cited by 43 publications

References 1 publication

Resolving Person Names in Web People Search

Resolving Person Names in Web People Search

Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co-occurrences

Automatically Extracting Personal Name Aliases from the Web

Contact Info

Product

Resources

About