PromptEM

Wang, Pengfei; Zeng, Xiaocan; Chen, Lu; Ye, Fan; Mao, Yuren; Zhu, Jiaan; Gao, Yunjun

doi:10.14778/3565816.3565836

Cited by 14 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We study the scheduling and coordination of the individual ER algorithms, in order to resolve the multiple datasets, and show the scalability of our approach….." The conflict between efficiency and effectiveness: The entity resolution models based on PLMs can be divided into two categories regarding representation learning: independent or interdependent representation [9]. Interdependent representation models [2,3,10,11] have a deep interaction between pairs of records through attention mechanisms, resulting in better matching quality. Despite being effective, interdependent representation models come with a poor scalability for the quadratic searching space of record pairs, thus need additional blocking steps.…”

Section: Joint Entity Resolution On Multiple Datasetsmentioning

confidence: 99%

“…By constructing prompt templates, downstream tasks can be transformed into fill-in-the-blank forms of upstream tasks, which can more effectively utilize the original network structure of PLM and the prior knowledge obtained from pretraining. PromptEM [10] is the first work that applies prompt tuning for entity resolution tasks and performs well under low-resource and sufficient resource settings. Recently, some work also adopted the prompt-based method to eliminate embedding bias in PLMs [24], which can also be used to improve the embedding quality of records in entity resolution tasks.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

PromptER: Prompt Contrastive Learning for Generalized Entity Resolution

Dai,

Tang,

et al. 2024

Artificial Intelligence and Applications

View full text Add to dashboard Cite

Entity resolution (ER), which aims to identify whether data records from various sources refer to the same real-world entity, is a crucial part of data integration systems. Traditional ER solutions assumes that data records are stored in relational tables with an aligned schema. However, in practical applications, it is common that data records to be matched may have different formats (e.g., relational, semi-structured, or textual types). In order to support ER for data records with varying formats, Generalized Entity Resolution has been proposed and has recently gained much attention. In this paper, we propose PromptER, a model based on pre-trained language models that offers an efficient and effective approach to accomplish Generalized Entity Resolution tasks. PromptER starts with a supervised contrastive learning process to train a Transformer encoder, which is afterward used for blocking and fine-tuned for matching. Specially, in the record embedding process, PromptER uses the proposed prompt embedding technique to better utilized the pre-trained language model layers and avoid embedding bias. Morever, we design a novel data augmentation method and an evaluation method to enhance the performance of the proposed model. We conduct experiments on the Generalized Entity Resolution dataset Machamp and the results show that PromptER significantly outperforms other state-of-art methods in the blocking and matching tasks.

show abstract