Reuse and Adaptation for Entity Resolution through Transfer Learning

Thirumuruganathan, Saravanan; Parambath, Shameem Ahamed Puthiya; Ouzzani, Mourad; Tang, Nan; Joty, Shafiq

doi:10.48550/arxiv.1809.11084

Cited by 2 publications

(5 citation statements)

References 25 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• TLER [34] is a non-deep transfer learning framework that defines a standard feature space and reuses the seen data to train models for the new domain.…”

Section: Methodsmentioning

confidence: 99%

“…A popular approach is to adapt the pre-trained model for the new task through fine-tuning [20], or by adding new functions to specific tasks such as object detection [15]. In terms of EL, TLER [34] is a non-deep method that reuses and adopts seen data from the source domain to train models for the new domain. Auto-EM [40] proposes to pre-train models for both attribute-type (i.e., schema) and attribute value matching based on word-and character-level similarity.…”

Section: Related Workmentioning

confidence: 99%

“…We define the attribute importance in entity linkage as the high-level transferable knowledge and automatically learn it through a proposed attributelevel attention mechanism (what to transfer). In general, as transfer learning aims to transfers knowledge learned from the domain with abundant training data to a related target domain with limited data, the existing works either rely on increasing the labeling volume by introducing the external data (e.g., public knowledge bases) [40] or reusing the seen training data [34]. On the contrary, AdaMEL adopts domain adaptation (DA) to jointly update the attention scores for attributes in both the seen and unseen data as the basis for entity linkage (how to handle multiple sources), so that the knowledge is adaptive to the continuously incoming data sources.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

Jin¹,

Sisman²,

Wei³

et al. 2021

Preprint

View full text Add to dashboard Cite

Multi-source entity linkage focuses on integrating knowledge from multiple sources by linking the records that represent the same real world entity. This is critical in high-impact applications such as data cleaning and user stitching. The state-of-the-art entity linkage pipelines mainly depend on supervised learning that requires abundant amounts of training data. However, collecting well-labeled training data becomes expensive when the data from many sources arrives incrementally over time. Moreover, the trained models can easily overfit to specific data sources, and thus fail to generalize to new sources due to significant differences in data and label distributions. To address these challenges, we present AdaMEL, a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. AdaMEL models the attribute importance that is used to match entities through an attribute-level self-attention mechanism, and leverages the massive unlabeled data from new data sources through domain adaptation to make it generic and data-source agnostic. In addition, AdaMEL is capable of incorporating an additional set of labeled data to more accurately integrate data sources with different attribute importance. Extensive experiments show that our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning. Besides, it is more stable in handling different sets of data sources in less runtime.

show abstract

“…• TLER [34] is a non-deep transfer learning framework that defines a standard feature space and reuses the seen data to train models for the new domain.…”

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

Jin¹,

Sisman²,

Wei³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Entity matching (EM) [16], which is to identify data instances that refer to the same real-world entity, is also related. Some EM works also employ a deep learning-based approach [24], [37], [42], [49], [57], [73], [82]. Mudgal and et al [57] evaluates and compares the performance of different deep learning models applied to EM with three types of data: structured data, textual data, and dirty data (with missing value, inconsistent attributes and/or miss-placed values).…”

Section: Schema/entity Matchingmentioning

confidence: 99%

“…In the past few years, deep learning (DL) has become the most popular direction in machine learning and artificial intelligence [46], [65], and has transformed a lot of research areas, such as image recognition, computer vision, speech recognition, natural language processing, etc.. In recent years, DL has been applied to database systems and applications to facilitate parameter tuning [47], [71], [76], [81], indexing [21], [43], partitioning [34], [86], cardinality estimation and query optimization [39], [44], and entity matching [24], [37], [42], [57], [73], [82]. While predictions based on deep learning cannot guarantee correctness, in the Big Data era, errors in data integration are usually tolerable as long as most of the data is correct, which is another motivation of our work.…”

Section: Introductionmentioning

confidence: 99%

Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Wang,

Zhou,

Das

et al. 2020

Preprint

View full text Add to dashboard Cite

Data is the king in the age of AI. However data integration is often a laborious task that is hard to automate. Schema change is one significant obstacle to the automation of the end-to-end data integration process. Although there exist mechanisms such as query discovery and schema modification language to handle the problem, these approaches can only work with the assumption that the schema is maintained by a database. However, we observe diversified schema changes in heterogeneous data and open data, most of which has no schema defined. In this work, we propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data to make the model robust to schema changes. Our experimental results demonstrate that our proposed approach is effective for two realworld data integration scenarios: coronavirus data integration, and machine log integration.

show abstract

Reuse and Adaptation for Entity Resolution through Transfer Learning

Cited by 2 publications

References 25 publications

Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Contact Info

Product

Resources

About