Peer-reviewed publications and patents serve as important signatures of knowledge generation, and therefore the authors and their organizations can represent agents of intellectual transformation. Accurate tracking of these players enables scholars to follow knowledge evolution. However, while author name disambiguation has been discussed extensively, less is known about the impact of organization name on bibliometric studies. We expand here on the recently defined phenomenon of "onomastic profusion," high-frequency words used in organization names for semantic reasons, and thus contributing a non-random source of error to bibliographic studies. We use the Small Business Innovation Research (SBIR) Phase I awardees of the National Aeronautics and Space Administration (NASA) as a use case in the field of engineering innovation. We find that firms in California or Massachusetts experience a six percent decrease in the likelihood of using the word "Technologies" in their names. Furthermore, use of the words "Research" and "Science" is linked to doubling the number of awards. We illustrate that, in aggregate, firms executing rational strategic naming decisions can create deterministic bibliometric challenges.
Bibliographic name disambiguation is an major semantic challenge, but critical to social sciences studies of important intellectual assets. Here we contribute to innovation research in several ways. We show a significant synonym problem in author names and discuss how a pre-processing heuristic step standardizing name variants helps, but homonyms generated with Chinese names are particularly difficult to resolve and manifest in an associated location list. Here we identify a new phenomenon of "onomastic profusion," the frequent use of certain words in firm names for semantic reasons that can confound disambiguation clustering algorithms. We illustrate these concerns with Patentopia, our customized platform accessing the PatentsView portal for the United States Patent and Trademark Office database and available for free academic use. This multi-stage system uses heuristics in concert with the PatentsView clustering process and reports meta-data to further assist analysis. As highly relevant use cases, we illustrate system performance with data derived from two important public innovation programs, I-Corps and Small Business Innovation Research (SBIR), and we close with implications for bibliometric analysis of current patent data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.