Semantic web technology is increasingly being used in projects of humanities researchers, such as historians and literary scholars. This technology makes it easier to access large-scale data sets from the cultural heritage world, such as the indexes on persons and locations of the Amsterdam City Archives. Semantic web technology also facilitates the integration of different data sources into knowledge graphs, which in turn enables cross-data set analyzes that were previously infeasible. This makes it possible, for example, to reconstruct someone's life on the basis of primary archival sources. However, the integration of different historical data sets entails a number of complications. Because the majority of these archival data sets are aimed at providing quick and easy access, it is likely that the same person is included multiple times within a single data set, or may also appear multiple times in different data sets, each time with a new entry and unique identifier. Until these unique entries are resolved and the duplicate entities are disambiguated, it is not yet possible to conduct this type of investigation.
This dissertation offers a solution to this problem and describes a method to reduce, if not solve, the number of duplicates in a set of knowledge graphs by clustering these unique entries, where each cluster represents a single real life object. To this end, it describes the application of so-called `embeddings': a technique for making computer-readable representations of entities such as nodes in a knowledge graph. The method described in this thesis constructs the embeddings such that a high similarity between two nodes is indicative of a duplicate. However, relying on these pairwise matches is not without risks. For example, the application of a threshold value can lead to a transitivity violation. That is, the entity pairs $(A, B)$ and $(B, C)$ can both have high similarity, but this need not be the case for pair $(A, C)$. To address this issue, we employ algorithms that make use of the pairwise similarities to find clusters that conform as best as possible to the computed similarities. Nevertheless, it happens that these clustering algorithms produce false positives and negatives. To counteract this and significantly improve the clustering results, this work describes the use of domain-specific knowledge and constraints to detect and correct clustering errors. An example of such a restriction is that one cannot marry oneself or that a person is first baptized and then buried.