2004
DOI: 10.1007/978-3-540-24775-3_75
|View full text |Cite
|
Sign up to set email alerts
|

Febrl – A Parallel Open Source Data Linkage System

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
78
0
1

Year Published

2005
2005
2012
2012

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 87 publications
(79 citation statements)
references
References 10 publications
0
78
0
1
Order By: Relevance
“…BN [38] MOMA [55] SERF [5] Active Atlas [53,54] MARLIN [11,12] Multiple Classifier System [62] Operator Trees [13] TAILOR [24] FEBRL [18,17] STEM [36] Context Based Framework [16] Training-based between two entities. The previously proposed approaches mostly assume that corresponding attributes from the input datasets have been determined beforehand, either manually or with the help of schema matching.…”
Section: Matchersmentioning
confidence: 99%
“…BN [38] MOMA [55] SERF [5] Active Atlas [53,54] MARLIN [11,12] Multiple Classifier System [62] Operator Trees [13] TAILOR [24] FEBRL [18,17] STEM [36] Context Based Framework [16] Training-based between two entities. The previously proposed approaches mostly assume that corresponding attributes from the input datasets have been determined beforehand, either manually or with the help of schema matching.…”
Section: Matchersmentioning
confidence: 99%
“…The authors of [4] show how the match computation can be parallelized among several cores on a single node. Parallel evaluation of the Cartesian product of two sources is described in [8].…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, while for many regular words there is only one correct spelling, there are often different written forms of proper names, for example 'Gail' and 'Gayle'. The main task of data cleaning and standardisation is the conversion of the raw input data into well defined, consistent forms and the resolution of inconsistencies in the way information is represented or encoded [9,10].…”
Section: Data Linkage Processmentioning
confidence: 99%
“…As discussed earlier, this is computationally feasible only for small data sets. In practise, blocking, filtering, indexing, searching, or sorting algorithms [2,9,15,21,23] are used to reduce the number of record pair comparisons as discussed in Section 2.1. The aim of such algorithms is to cheaply remove as many record pairs from the set of non-matches U that are obvious nonmatches, without removing any record pairs from the set of matches M .…”
Section: Blocking and Complexity Measuresmentioning
confidence: 99%
See 1 more Smart Citation