We show how Bayesian probability models can be used to integrate two databases, one of which does not have a key for uniquely identifying clients (e.g., social security number or medical record number). The analyst selects a set of imperfect identifiers (last visit diagnosis, first name, etc.). The algorithm assesses the likelihood ratio associated with the identifier from the database of known cases. It estimates the probability that two records belong to the same client from the likelihood ratios. As it proceeds in examining various identifiers, it accounts for inter-dependencies among them by allowing overlapping and redundant identifiers to be used. We test that the procedure is effective by examining data from the Medical Expenditure Panel Survey (MEPS) Population Characteristics data set, a publicly available data set. We randomly selected 1,000 cases for training data set--these constituted the known cases. The algorithm was used to identify if 100 cases not in the training data set would be misclassified in terms of being a case in the training set or a new case. With 12 fields as identifiers, all 100 cases were correctly classified as new cases. We also selected 100 known cases from the training set and asked the algorithm to classify these cases. Again, all 100 cases were correctly classified. Less accurate results were obtained when the training data set was too small (e.g., less than 100 records) or the number of fields used as identifiers was too small (e.g., less than seven fields). In a test of performance of the algorithm, when the ratio of testing to training data set exceeds 4 to 1, the accuracy of the algorithm exceeded 90% of cases. As the ratio increases, the accuracy of algorithm improves further. These data suggest the accuracy of our automated and mathematical procedure to merge data from two different data sets without the presence of a unique identifier. The algorithm uses imperfect and overlapping clues to re-identify cases from information not typically considered to be a patient identifier.
Public reporting burder for this collection of information is estibated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing this collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burder is an unrealized goal with great potential. Interoperability can facilitate training, simulation-based acquisition, mission planning and rehearsal, and course of action development and analysis. Recent interoperability research has concentrated on general-purpose approaches that can provide standards and reuse across a wide range of systems. One important component of interoperability is data model alignment, the degree to which the data models of two systems use the same elements. This paper presents a rigorous definition of data model alignment and uses it to assess the degree of alignment between the Army Integrated Core Data Model (AICDM) and the standard objects defined by the Object Management Standards Category (OMSC), two important emerging standards in the C41 and M&S communities, respectively. This assessment is used to make recommendations on changes to each model that would promote interoperability. The conclusion is that the OMSC standard objects need considerable work to model C4I data. Command, Control, Communications, and Computers (ODISC4). It was performed in response to a task objective to develop a series of recommendations for how to align the underlying data representations that are required in C4I and in modeling and simulation (M&S) systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.