ters or components . The result is a disconnected graph in which the points are distributed among the components. In this case, a partition is produced such that where s is the total number of components in the graph, Pi is the number of points in, referred to as the order of, the ith component, and p is the tot al number of points in the graph [2]. Components of order one (Pi = 1) are referred to as isolated po ints , and a graph composed of one component (s = I) is said to be conn ected [2], Certain partitions of co-citation graphs have been interpreted to represent the "macrostructure of science" [3]. In this interpretation , individual comp onents are said to represent scientific specialties, and these components together with the underlying structures are said to provide "maps of scientific specialties" 13,4]. Although experimental evidence is consistent with the se interpretations, the structure, and consequently the meaning, of a co-citation graph is strongly influenced by operational uncertainties associated with two fundamental assumptions [3-9J. First , it is assumed that "highly" cited papers represent "important" concepts and methods in science [10]. Second , it is assumed that "frequently" co-cited pairs are related by content [10J. Several questions arise. What values of t, reliably distinguish "important" papers from all other papers? What values of t o produce meaningful pairwise associations? Which combinations of t i and t o produce interpretable structures?If the investigation of bibliometric structures is to advance beyond simple data description, a stronger connection between the associated techniques and statistical theory must be developed in order to address such questions. In this pap er, the stati stical validity of a co-citation multigraph is investigated as a function of the co-citation threshold t" for a given value of the citation threshold t i • The results constitute a test of the Random Graph Hypothesis (RG H) in the cont ext of co-citation clustering.Using the Random Graph Hypothesis, the statistical vaIidity of co-cltatlon graphs has been investigated as a function of co-cltatlon strength for a given value of citation frequency. The results show that for both high and low values of co-cltatlcn strength the partition of cited documents produced by the co-cltatlon relationship may be statistically invalid. Critical thresholds can be identified that define the limits of statistical validity. Within these limits, there is a narrow region of statistical validity where the associated structures are not an artifact of the clustering procedure and can be interpreted. It is concluded that the choice of citation and co-cltatlon thresh· olds can be Influenced by formal considerations which insure statistically meaningful partitions rather than arbitrary decisions which can produce meaningless interpretations. Experimental and theoretical implications for the co-cltatlon graph and other bibliometric structures are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.