2018
DOI: 10.7250/csimq.2018-16.04
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Matching and Clustering of Entities with FAMER

Abstract: Entity resolution identifies semantically equivalent entities, e.g. describing the same product or customer. It is especially challenging for Big Data applications where large volumes of data from many sources have to be matched and integrated. We therefore introduce a scalable entity resolution framework called FAMER (FAst Multi-source Entity Resolution system) that is based on Apache Flink for distributed execution and that can holistically match entities from multiple sources. For the latter purpose, FAMER … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(30 citation statements)
references
References 28 publications
0
28
0
Order By: Relevance
“…In the simplest case, Connected Components [80,153] is applied to compute the transitive closure of the detected matches. This naive approach increases recall, but is rather sensitive to noise.…”
Section: Clustering Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the simplest case, Connected Components [80,153] is applied to compute the transitive closure of the detected matches. This naive approach increases recall, but is rather sensitive to noise.…”
Section: Clustering Methodsmentioning
confidence: 99%
“…The final task in the end-to-end ER workflow is Clustering [80,126,[153][154][155], which groups together the identified matches such that all descriptions within a cluster match. Its goal is actually to infer indirect matching relations among the detected pairs of matching descriptions so as to overcome possible limitations of the employed similarity functions.…”
Section: Q3mentioning
confidence: 99%
“…For Dirty ER, the simplest approach is Connected Components [31,32], which sets a cut-off threshold t and considers as matches all comparisons with a similarity score higher than t; then, it estimates the transitive closure of the matches. For higher robustness to noise, more advanced algorithms build clusters around selected entities that operate as centers.…”
Section: ) Entity Clustering (Ecmentioning
confidence: 99%
“…The first category includes the open-source tools that are crafted for structured data, namely Magellan [3], Dedupe [68], DuDe [69], Febrl [65], FRIL [70], OYSTER [71], Record Linkage [72] and FAMER [32]. All of them apply a budget-agnostic, schema-based end-to-end workflow that typically consists of two steps: Blocking and Matching.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation