What factors affect a scientist's choice of research problem? Qualitative research in the history, philosophy, and sociology of science suggests that this choice is shaped by an "essential tension" between the professional demand for productivity and a conflicting drive toward risky innovation. We examine this tension empirically in the context of biomedical chemistry. We use complex networks to represent the evolving state of scientific knowledge, as expressed in publications. We then define research strategies relative to these networks. Scientists can introduce novel chemicals or chemical relationships-or delve deeper into known ones. They can consolidate existing knowledge clusters, or bridge distant ones. Analyzing such choices in aggregate, we find that the distribution of strategies remains remarkably stable, even as chemical knowledge grows dramatically. High-risk strategies, which explore new chemical relationships, are less prevalent in the literature, reflecting a growing focus on established knowledge at the expense of new opportunities. Research following a risky strategy is more likely to be ignored but also more likely to achieve high impact and recognition. While the outcome of a risky strategy has a higher expected reward than the outcome of a conservative strategy, the additional reward is insufficient to compensate for the additional risk. By studying the winners of 137 different prizes in biomedicine and chemistry, we show that the occasional "gamble" for extraordinary impact is the most plausible explanation for observed levels of risk-taking. Our empirical demonstration and unpacking of the "essential tension" suggests policy interventions that may foster more innovative research.
Directed networks are ubiquitous and are necessary to represent complex systems with asymmetric interactions-from food webs to the World Wide Web. Despite the importance of edge direction for detecting local and community structure, it has been disregarded in studying a basic type of global diversity in networks: the tendency of nodes with similar numbers of edges to connect. This tendency, called assortativity, affects crucial structural and dynamic properties of real-world networks, such as error tolerance or epidemic spreading. Here we demonstrate that edge direction has profound effects on assortativity. We define a set of four directed assortativity measures and assign statistical significance by comparison to randomized networks. We apply these measures to three network classes-online/social networks, food webs, and word-adjacency networks. Our measures (i) reveal patterns common to each class, (ii) separate networks that have been previously classified together, and (iii) expose limitations of several existing theoretical models. We reject the standard classification of directed networks as purely assortative or disassortative. Many display a class-specific mixture, likely reflecting functional or historical constraints, contingencies, and forces guiding the system's evolution.C omplex networks reveal essential features of the structure, function, and dynamics of many complex systems (1-4). While networks from diverse fields share various properties (3, 5-7) and universal patterns (1, 3), they also display enormous structural, functional, and dynamical diversity. A basic measure of diversity is assortativity by degree (hereafter assortativity): the tendency of nodes to link to other nodes with a similar number of edges (4,8,9). Despite its importance, no disciplined approach to assortativity in directed networks has been proposed. Here we present such an approach and show that measures of directed assortativity provide a number of insights into the structure of directed networks and key factors governing their evolution.Assortativity is a standard tool in analyzing network structure (4) and has a simple interpretation. In assortative networks with symmetric interactions (i.e., undirected networks), high degree nodes, or nodes with many edges, tend to connect to other high degree nodes. Hence, assortative networks remain connected despite node removal or failure (9), but are hard to immunize against the spread of epidemics (10). In disassortative networks, conversely, high degree nodes tend to connect to low degree nodes (8, 9); these networks limit the effects of node failure because important nodes (with many edges) are isolated from each other (11). Assortativity has a convenient global measure: the Pearson correlation (r) between the degrees of nodes sharing an edge (8, 9). It ranges from −1 to 1, with (r > 0) in assortative networks and (r < 0) in disassortative ones. Earlier work proposed a simple classification of networks on the basis of assortativity, in which social networks are assortative and b...
A scientist's choice of research problem affects his or her personal career trajectory. Scientists' combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity's importance corresponds to its degree centrality, and a problem's difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies.complex networks | computational biology | science of science | innovation | sociology of science A scientist's choice of research problem directly affects his or her career. Indirectly, it affects the scientific community. A prescient choice can result in a high-impact study. This boosts the scientist's reputation, but it can also create research opportunities across the field. Scientific choices are hard to quantify because of the complexity and dimensionality of the underlying problem space. In formal or computational models, problem spaces are typically encoded as simple choices between a few options (1, 2) or as highly abstract "landscapes" borrowed from evolutionary biology (3-5). The resulting insight about the relationship between research choice and collective efficiency is suggestive, but necessarily qualitative and abstract.We obtain concrete, quantitative insight by representing the growth of knowledge as an evolving network extracted from the literature (2, 6). Nodes in the network are scientific concepts and edges are the relations between them asserted in publications. For example, molecules-a core concept in chemistry, biology, and medicine-may be linked by physical interaction (7) or shared clinical relevance (8). Variations of this network metaphor for knowledge have appeared in philosophy (9), social studies of science (10-12), artific...
The growth of electronic publication and informatics archives makes it possible to harvest vast quantities of knowledge about knowledge, or "metaknowledge." We review the expanding scope of metaknowledge research, which uncovers regularities in scientific claims and infers the beliefs, preferences, research tools, and strategies behind those regularities. Metaknowledge research also investigates the effect of knowledge context on content. Teams and collaboration networks, institutional prestige, and new technologies all shape the substance and direction of research. We argue that as metaknowledge grows in breadth and quality, it will enable researchers to reshape science-to identify areas in need of reexamination, reweight former certainties, and point out new paths that cut across revealed assumptions, heuristics, and disciplinary boundaries.
a b s t r a c tScience is a complex system. Building on Latour's actor network theory, we model published science as a dynamic hypergraph and explore how this fabric provides a substrate for future scientific discovery. Using millions of abstracts from MEDLINE, we show that the network distance between biomedical things (i.e., people, methods, diseases, chemicals) is surprisingly small. We then show how science moves from questions answered in one year to problems investigated in the next through a weighted random walk model. Our analysis reveals intriguing modal dispositions in the way biomedical science evolves: methods play a bridging role and things of one type connect through things of another. This has the methodological implication that adding more node types to network models of science and other creative domains will likely lead to a superlinear increase in prediction and understanding.
Clustering, assortativity, and communities are key features of complex networks. We probe dependencies between these features and find that ensembles of networks with high clustering display both high assortativity by degree and prominent community structure, while ensembles with high assortativity show much less enhancement of the clustering or community structure. Further, clustering can amplify a small homophilic bias for trait assortativity in network ensembles. This marked asymmetry suggests that transitivity could play a larger role than homophily in determining the structure of many complex networks.
Divergent interests, expertise, and language form cultural barriers to communication. No formalism has been available to characterize these "cultural holes. " Here we use information theory to measure cultural holes and demonstrate our formalism in the context of scientific communication using papers from JSTOR. We extract scientific fields from the structure of citation flows and infer field-specific cultures by cataloging phrase frequencies in full text and measuring the relative efficiency of between-field communication. We then combine citation and cultural information in a novel topographic map of science, mapping citations to geographic distance and cultural holes to topography. By analyzing the full citation network, we find that communicative efficiency decays with citation distance in a field-specific way. These decay rates reveal hidden patterns of cohesion and fragmentation. For example, the ecological sciences are balkanized by jargon, whereas the social sciences are relatively integrated. Our results highlight the importance of enriching structural analyses with cultural data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.