2005
DOI: 10.1007/978-3-540-30586-6_24
|View full text |Cite
|
Sign up to set email alerts
|

Name Discrimination by Clustering Similar Contexts

Abstract: Abstract. It is relatively common for different people or organizations to share the same name. Given the increasing amount of information available online, this results in the ever growing possibility of finding misleading or incorrect information due to confusion caused by an ambiguous name. This paper presents an unsupervised approach that resolves name ambiguity by clustering the instances of a given name into groups, each of which is associated with a distinct underlying entity. The features we employ to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
65
0

Year Published

2006
2006
2009
2009

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 94 publications
(67 citation statements)
references
References 13 publications
2
65
0
Order By: Relevance
“…E.g., Pedersen et al [22] propose a method based on clustering using second-order context vectors derived from singular value decomposition (SVD) on a bigramdocument co-occurrence matrix. And Al-Kamha and Embley [1] study combinations of three different representation methods-attribute (factoid) based representations like those used in [20,23], link/citation-based, and content-based.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…E.g., Pedersen et al [22] propose a method based on clustering using second-order context vectors derived from singular value decomposition (SVD) on a bigramdocument co-occurrence matrix. And Al-Kamha and Embley [1] study combinations of three different representation methods-attribute (factoid) based representations like those used in [20,23], link/citation-based, and content-based.…”
Section: Related Workmentioning
confidence: 99%
“…One particular case of this people-document association task is referred to as personal name resolution [26,27,23] (also referred to as personal name disambiguation/discrimination [9,22], and cross-document co-reference [4,12]). The task is as follows: given a set of documents all of which refer to a particular person name but not necessarily to a single individual (usually called referent), identify which documents are associated with each referent by that name.…”
Section: Introductionmentioning
confidence: 99%
“…The disambiguation of authors whose names coincide and overlap in the citation indexing systems calls for methods that distinguish the contexts in which authors or individuals appear, such as cross-document co-reference resolution algorithms that use the vector space model to resolve ambiguities (Bagga & Baldwin, 1998;Gooi & Allan, 2004;Han, Giles, Zha, Li, & Tsioutsiouliklis, 2004;Wacholder, Ravin, & Choi, 1997), co-occurrence analysis and clustering techniques (Han, Zha, & Giles, 2005;Mann & Yarowsky, 2003;Pedersen, Purandare, & Kulkarni, 2005), probabilistic similarity metrics (Torvik, Weeber, Swanson, & Smalheiser, 2005), or co-citation analysis and visualization mapping algorithms through a Kohonen network (X. Lin, White, & Buzydlowski, 2003;McCain, 1990).…”
Section: Introductionmentioning
confidence: 99%
“…N ent (11) That is, the average accuracy is defined as the average number of right assignments entity/instance of our naive algorithm divided by the total number of possible assignments. The total number of assignments coincides with the number of entities in the corpus (N ent ) due to the assumption that each entity has at least one candidate.…”
Section: Occ(e)p (Right/e)mentioning
confidence: 99%
“…Those can be further classified on those that take a "bag of words" context (the position of the words taken as context is not considered) like [11] and those that try to use the role of each word in the context and their relation with the entity [9]. Although some approaches use both common words and named entities as context [11], others suggest that better results can be obtained using as context only other named entities [9]. -The use of knowledge sources like lexical databases, etc., that define the instances that should be matched against the entities and can provide information that can be exploited to perform the matchings.…”
Section: Related Workmentioning
confidence: 99%