Counting Couples: The Marriage Banns Registers of the City of Amsterdam, 1580–1810

The problem of entity resolution is central in the field of Digital Humanities. It is also one of the major issues in the Golden Agents project, which aims at creating an infrastructure that enables researchers to search for patterns that span across decentralised knowledge graphs from cultural heritage institutes. To this end, we created a method to perform entity resolution on complex historical knowledge graphs. In previous work, we encoded and embedded the relevant (duplicate) entities in a vector space to derive similarities between them based on sharing a similar context in RDF graphs. In some cases, however, available domain knowledge or rational axioms can be applied to improve entity resolution performance. We show how domain knowledge and rational axioms relevant to the task at hand can be expressed as (probabilistic) rules, and how the information derived from rule application can be combined with quantitative information from the embedding. In this work, we perform our entity resolution method on two data sets. First, we apply it to a data set for which we have a detailed ground truth for validation. This experiment shows that the combination of embedding and the application of domain knowledge and rational axioms leads to improved resolution performance. Second, we perform a case study by applying our method to a larger data set for which there is no ground truth and where the outcome is subsequently validated by a domain expert. Results of this demonstrate that our method achieves a very high precision.

show abstract

“…[=Yes I do!] [19]. Both the original Notice of Marriage index as well as this enrichment have been reconciled in the Golden Agents project.…”

Section: Notice Of Marriage Registrationsmentioning

confidence: 99%

Adding Domain Knowledge to Improve Entity Resolution in 17th and 18th Century Amsterdam Archival Records

Baas

Wissen

Reinders

et al. 2022

Towards a Knowledge-Aware AI

View full text Add to dashboard Cite

show abstract

“…(Yes I do!) [65], who have digitized the records for every fifth year. This means that for 20% of this data, we also have information on among others the witnesses participating in the event.…”

Section: 1mentioning

confidence: 99%

Entity Resolution on Historical Knowledge Graphs

Baas

View full text Add to dashboard Cite

Semantic web technology is increasingly being used in projects of humanities researchers, such as historians and literary scholars. This technology makes it easier to access large-scale data sets from the cultural heritage world, such as the indexes on persons and locations of the Amsterdam City Archives. Semantic web technology also facilitates the integration of different data sources into knowledge graphs, which in turn enables cross-data set analyzes that were previously infeasible. This makes it possible, for example, to reconstruct someone's life on the basis of primary archival sources. However, the integration of different historical data sets entails a number of complications. Because the majority of these archival data sets are aimed at providing quick and easy access, it is likely that the same person is included multiple times within a single data set, or may also appear multiple times in different data sets, each time with a new entry and unique identifier. Until these unique entries are resolved and the duplicate entities are disambiguated, it is not yet possible to conduct this type of investigation. This dissertation offers a solution to this problem and describes a method to reduce, if not solve, the number of duplicates in a set of knowledge graphs by clustering these unique entries, where each cluster represents a single real life object. To this end, it describes the application of so-called `embeddings': a technique for making computer-readable representations of entities such as nodes in a knowledge graph. The method described in this thesis constructs the embeddings such that a high similarity between two nodes is indicative of a duplicate. However, relying on these pairwise matches is not without risks. For example, the application of a threshold value can lead to a transitivity violation. That is, the entity pairs $(A, B)$ and $(B, C)$ can both have high similarity, but this need not be the case for pair $(A, C)$. To address this issue, we employ algorithms that make use of the pairwise similarities to find clusters that conform as best as possible to the computed similarities. Nevertheless, it happens that these clustering algorithms produce false positives and negatives. To counteract this and significantly improve the clustering results, this work describes the use of domain-specific knowledge and constraints to detect and correct clustering errors. An example of such a restriction is that one cannot marry oneself or that a person is first baptized and then buried.

show abstract

Counting Couples: The Marriage Banns Registers of the City of Amsterdam, 1580–1810

Cited by 2 publications

References 27 publications

Adding Domain Knowledge to Improve Entity Resolution in 17th and 18th Century Amsterdam Archival Records

Adding Domain Knowledge to Improve Entity Resolution in 17th and 18th Century Amsterdam Archival Records

Entity Resolution on Historical Knowledge Graphs

Contact Info

Product

Resources

About