A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events (said to be matched).A comparison is to be made between the recorded characteristics and values in two records (one from each file) and a decision made as to whether or not the members of the comparison-pair represent the same person or event, or whether there is insufficient evidence to justify either of these decisions at stipulated levels of error. These three decisions are referred to as link (AI), a non-link (A 3 ) , and a possible link (A 2 ) . The first two decisions are called positive dispositions. The two types of error are defined as the error of the decision AI when the members of the comparison pair are in fact unmatched, and the error of the decision Aa when the members of the comparison pair are, in fact matched. The probabilities of these errors are defined as lAo = L: u(y)p(Ad y) yEt and x = L m( y)p(A.1 y) yEt respectively where u(y), m(y) are the probabilities of realizing y (a comparison vector whose components are the coded agreements and disagreements on each characteristic) for unmatched and matched record pairs respectively. The summation is over the whole comparison space r of possible realizations.A linkage rule assigns probabilities P(A1!y), and P(A 2 1-y), and P(Aaly) to each possible realization of y E I'. An optimal linkage rule L (lAo, X, I') is defined for each value of (iL,~) as the rule that minimizes peAt) at those error levels. In other words, for fixed levels of error, the rule minimizes the probability of failing to make positive dispositions.A theorem describing the construction and properties of the optimal linkage rule and two corollaries to the theorem which make it a practical working tool are given.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique.
SummaryThe problem of sampling with probability proportional to a measure of size and with certain requirements for the probabilities of joint selection of pairs of units has a history that goes back to about 1943. However, although more than 50 articles on the subject have been published none of them has given a completely satisfactory solution and none of them has emerged, at least for sample sizes greater than two, as a practical alternative to the usual method of systematic sampling with probability proportional to size. The solution given in this paper is exact, simple and intuitively appealing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.