Jr. W. M. Shaw scite author profile

ters or components . The result is a disconnected graph in which the points are distributed among the components. In this case, a partition is produced such that where s is the total number of components in the graph, Pi is the number of points in, referred to as the order of, the ith component, and p is the tot al number of points in the graph [2]. Components of order one (Pi = 1) are referred to as isolated po ints , and a graph composed of one component (s = I) is said to be conn ected [2], Certain partitions of co-citation graphs have been interpreted to represent the "macrostructure of science" [3]. In this interpretation , individual comp onents are said to represent scientific specialties, and these components together with the underlying structures are said to provide "maps of scientific specialties" 13,4]. Although experimental evidence is consistent with the se interpretations, the structure, and consequently the meaning, of a co-citation graph is strongly influenced by operational uncertainties associated with two fundamental assumptions [3-9J. First , it is assumed that "highly" cited papers represent "important" concepts and methods in science [10]. Second , it is assumed that "frequently" co-cited pairs are related by content [10J. Several questions arise. What values of t, reliably distinguish "important" papers from all other papers? What values of t o produce meaningful pairwise associations? Which combinations of t i and t o produce interpretable structures?If the investigation of bibliometric structures is to advance beyond simple data description, a stronger connection between the associated techniques and statistical theory must be developed in order to address such questions. In this pap er, the stati stical validity of a co-citation multigraph is investigated as a function of the co-citation threshold t" for a given value of the citation threshold t i • The results constitute a test of the Random Graph Hypothesis (RG H) in the cont ext of co-citation clustering.Using the Random Graph Hypothesis, the statistical vaIidity of co-cltatlon graphs has been investigated as a function of co-cltatlon strength for a given value of citation frequency. The results show that for both high and low values of co-cltatlcn strength the partition of cited documents produced by the co-cltatlon relationship may be statistically invalid. Critical thresholds can be identified that define the limits of statistical validity. Within these limits, there is a narrow region of statistical validity where the associated structures are not an artifact of the clustering procedure and can be interpreted. It is concluded that the choice of citation and co-cltatlon thresh· olds can be Influenced by formal considerations which insure statistically meaningful partitions rather than arbitrary decisions which can produce meaningless interpretations. Experimental and theoretical implications for the co-cltatlon graph and other bibliometric structures are discussed.

show abstract

Subject indexing and citation indexing—part I: Clustering structure in the cystic fibrosis document collection∗

Shaw

1990

Information Processing & Management

View full text Add to dashboard Cite

Controlled and uncontrolled subject descriptions in the CF database: A comparison of optimal cluster-based retrieval results

Shaw

1993

Information Processing & Management

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jr. W. M. Shaw

Term-relevance computations and perfect retrieval performance

Subject indexing and citation indexing— part II: An evaluation and comparison

Critical thresholds in co‐citation graphs

Subject indexing and citation indexing—part I: Clustering structure in the cystic fibrosis document collection∗

Controlled and uncontrolled subject descriptions in the CF database: A comparison of optimal cluster-based retrieval results

Contact Info

Product

Resources

About