“…Before feeding the record sets returned by data extraction into a particular application, it is commonly necessary to perform some of the following integration tasks: semantisation [25,45,54,55,60,63,71], which either maps the descriptors onto the terminology box of a particular ontology or the tuples onto its assertion box [19]; union [23], which merges record sets that provide similar data; finding primary keys [62], which determines which components of the tuples identify them as univocally as possible; record linkage [8,11,12], which finds different records that refer to the same actual entities; augmentation [6,52,67], which joins record sets on the same topic to complete the information that they provide individually; and cleaning [10,31,61], which fixes data. Note that the integration tasks are orthogonal to data extraction because they are independent from the source of the record sets, which is the reason why they fall out of the scope of this article.…”