Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework

Chen, Zhaoqiang; Chen, Qun; Fan, Fengfeng; Wang, Yanyan; Wang, Zhuo; Nafa, Youcef; Li, Zhanhuai; Liu, Hailong; Pan, Wei

doi:10.1109/icde.2018.00107

Cited by 6 publications

(14 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The r-HUMO framework is built on the recently proposed HUMO framework [14], [15], which can enforce quality guarantees at both precision and recall fronts. The general idea of HUMO and r-HUMO was similar to the Fellegi-Sunter theory of record linking [3], which also proposed to divide an ER workload into three parts based on match probability.…”

Section: Related Workmentioning

confidence: 99%

“…the set of instance pairs with the feature f For presentation simplicity, we summarize the frequently used notations in Table 1. Formally, we define the problem of entity resolution with quality guarantees [14], [15] as follows:…”

Section: Notationmentioning

confidence: 99%

“…As human work is usually more expensive than machine computation, HUMO aims to minimize the workload in D H while guaranteeing resolution quality. By quantifying human cost by the number of instance pairs in D H , we define the optimization problem of HUMO as follows [14], [15]:…”

Section: The Humo Frameworkmentioning

confidence: 99%

“…3 is challenging because the proportions of equivalent pairs in D + and D − are unknown, thus need to be estimated. There exist two types of approaches to minimize the size of D H : one purely based on the monotonicity assumption of precision and the other one based on sampling [14], [15]. They estimate equivalence proportion based on different assumptions.…”

Section: The Humo Frameworkmentioning

confidence: 99%

“…The approach based on active learning [12], [13] can maximize recall while ensuring a pre-specified precision level. More recently, a HUman-Machine cOoperation framework [14], [15], denoted by HUMO, has been proposed to enforce more comprehensive quality guarantees at both precision and recall fronts. HUMO enables a flexible mechanism for quality control by partitioning an ER workload between the human and the machine.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

r-HUMO: A Risk-Aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees

Hou

Chen

et al. 2020

IEEE Trans. Knowl. Data Eng.

Self Cite

View full text Add to dashboard Cite

Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to enforce quality guarantees. To this end, we propose a r isk-aware HUman-Machine cOoperation framework for ER, denoted by r-HUMO. Built on the existing HUMO framework, r-HUMO similarly enforces both precision and recall guarantees by partitioning an ER workload between the human and the machine. However, r -HUMO is the first solution that optimizes the process of human workload selection from a risk perspective. It iteratively selects human workload by real-time risk analysis based on the human-labeled results as well as the pre-specified machine metric. In this paper, we first introduce the r-HUMO framework and then present the risk model to prioritize the instances for manual inspection. Finally, we empirically evaluate r-HUMO's performance on real data. Our extensive experiments show that r-HUMO is effective in enforcing quality guarantees, and compared with the state-of-the-art alternatives, it can achieve desired quality control with reduced human cost.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Notationmentioning

confidence: 99%