2016
DOI: 10.14778/2994509.2994535
|View full text |Cite
|
Sign up to set email alerts
|

Magellan

Abstract: Entity matching (EM) has been a long-standing challenge in data management. Most current EM works focus only on developing matching algorithms. We argue that far more efforts should be devoted to building EM systems. We discuss the limitations of current EM systems, then present as a solution Magellan, a new kind of EM systems. Magellan is novel in four important aspects. (1) It provides how-to guides that tell users what to do in each EM scenario, step by step. (2) It provides tools to help users do these ste… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 181 publications
(13 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…To verify the validity of the proposed model, we also collect the dataset [29] for the experimental study of Entity Matching (EM) published by SIGMOD in 2018, and this dataset includes tabular data in the fields of bibliography, music, e-commerce, etc. Moreover, Magellan [30] , Corleone [31] , Falcon [32] , and other datasets that can also be used for EM experiments are also adopted. Because these datasets involve common fields, they are merged and re-divided to obtain two new English datasets.…”
Section: Datasetsmentioning
confidence: 99%
“…To verify the validity of the proposed model, we also collect the dataset [29] for the experimental study of Entity Matching (EM) published by SIGMOD in 2018, and this dataset includes tabular data in the fields of bibliography, music, e-commerce, etc. Moreover, Magellan [30] , Corleone [31] , Falcon [32] , and other datasets that can also be used for EM experiments are also adopted. Because these datasets involve common fields, they are merged and re-divided to obtain two new English datasets.…”
Section: Datasetsmentioning
confidence: 99%
“…Following the seminal Fellegi-Sunter model for record linkage [16], a major focus of prior work has been on classifying pairs of input records as match, non-match, or potential match. While even some of the early work on record linkage incorporated a 1-1 matching constraint [62], the primary focus of most recent works has been on the effectiveness of the classification task, mainly by leveraging machine [28] and deep learning [5,35,40] methods.…”
Section: Related Workmentioning
confidence: 99%
“…The user can choose among several workflows and configure each step through both graphical and programming interfaces. Magellan [6] is a popular state-of-the-art ecosystem of entity matching tools for data scientists. Users can try out different blockers and matchers, utilize builtin debugging helpers, and use the provided guides to work through the process.…”
Section: Background and State Of The Artmentioning
confidence: 99%
“…Of course, most of these pain points have been addressed by varying degrees by existing work, but few efforts treat them holistically. For example, Magellan [6] offers flexibility and a short-term iterative workflow, but it is not suited for domain experts and lacks longterm iterative workflows. While CloudMatcher [4] can be used by domain experts, but are rigid.…”
Section: Challenges For Industrial Usementioning
confidence: 99%