2013
DOI: 10.1371/journal.pone.0070299
|View full text |Cite
|
Sign up to set email alerts
|

Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption?

Abstract: The assumption that a name uniquely identifies an entity introduces two types of errors: splitting treats one entity as two or more (because of name variants); lumping treats multiple entities as if they were one (because of shared names). Here we investigate the extent to which splitting and lumping affect commonly-used measures of large-scale named-entity networks within two disambiguated bibliographic datasets: one for co-author names in biomedicine (PubMed, 2003–2007); the other for co-inventor names in U.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

7
42
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 40 publications
(49 citation statements)
references
References 53 publications
7
42
0
Order By: Relevance
“…The herein presented study is in line with the work by Fegley and Torvik (2013) and Strotmann and Zhao (2012) in that it attempts to estimate the effect of errors that are due to initial based disambiguation by comparing coauthorship networks generated from the same dataset by using different disambiguation methods. Our study differs from prior work in that we consider different domains, namely computer science and information science.…”
Section: Introductionmentioning
confidence: 57%
See 3 more Smart Citations
“…The herein presented study is in line with the work by Fegley and Torvik (2013) and Strotmann and Zhao (2012) in that it attempts to estimate the effect of errors that are due to initial based disambiguation by comparing coauthorship networks generated from the same dataset by using different disambiguation methods. Our study differs from prior work in that we consider different domains, namely computer science and information science.…”
Section: Introductionmentioning
confidence: 57%
“…Overall, the identification of biases and errors induced by initial based disambiguation is only possible if ground truth data is available. Since human-disambiguated coauthorship data are extremely rare and only available on a small scale, scholars have been using highly accurate computational solutions as a proxy (Fegley & Torvik, 2013;Strotmann & Zhao, 2012). Even though the most advanced algorithms cannot guarantee perfect disambiguation (Diesner & Carley, 2009), this strategy allows for comparing datasets and results based on initial based disambiguation and computationally disambiguated datasets.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…As noted by Kim and Diesner [5] this finding has been frequently cited, the paper has received almost 3500 Google Scholar citations, and a number of scholars have taken for granted that simple disambiguation methods can work quite efficiently. That last proposition has been challenged by Fegley and Torvik [6] for very large datasets and by Kim and Diesner [5] who shows that also in smaller datasets there might be substantial error rates. The latter authors indicate, by the use of more advanced methods for disambiguation that take several types of information into account, [1] that the rate of error can be substantial, especially in areas where there are high numbers of Asian named authors.…”
Section: Introductionmentioning
confidence: 99%