2017
DOI: 10.1016/j.jss.2017.03.003
|View full text |Cite
|
Sign up to set email alerts
|

An efficient spark-based adaptive windowing for entity matching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 8 publications
0
12
0
Order By: Relevance
“…It works on the concept of document collections and provides a flexible storage architecture. PostGIS 12 is an open source spatial database extension for PostgreSQL Object-Relational Database Management System (ORDBMS) that adds support for geographic objects and follows the Open Geospatial Consortium's ''Simple Features for SQL Specification'' [50]. It provides several features such as processing of vector and raster data, spatial reprojection, import/export of Environmental Systems Research Institute (ESRI) shapefiles, 3D object support.…”
Section: ) Data Storage and Managementmentioning
confidence: 99%
See 1 more Smart Citation
“…It works on the concept of document collections and provides a flexible storage architecture. PostGIS 12 is an open source spatial database extension for PostgreSQL Object-Relational Database Management System (ORDBMS) that adds support for geographic objects and follows the Open Geospatial Consortium's ''Simple Features for SQL Specification'' [50]. It provides several features such as processing of vector and raster data, spatial reprojection, import/export of Environmental Systems Research Institute (ESRI) shapefiles, 3D object support.…”
Section: ) Data Storage and Managementmentioning
confidence: 99%
“…It comprehensively describes the ICT solution proposed in the EUBra-BIGSEA context, the developed services and algorithms from an application-centric perspective. In this respect, it focuses on data quality and privacy aspects as well as descriptive analyt- 1 https://www.eubra-bigsea.eu/-Last visited on July 2019 ics and graphical user interface (GUI) implementation details, thus providing a novel contribution with respect to previous work ( [10], [12]- [15]).…”
Section: Introductionmentioning
confidence: 99%
“…[3], [12], and [13]. Only some initial approaches consider the use of the Apache Spark framework for distributed ER [14], [15]. FAMER utilizes Apache Flink which is similar to Apache Spark and both frameworks improve on MapReduce due to a better utilization of in-memory processing and better support for iterative algorithms as needed for clustering [16].…”
Section: Related Workmentioning
confidence: 99%
“…In the processing of source-inconsistent clusters/components we sequentially process the intra-component links (lines [12][13][14][15] in the order of their maximal link priority (determined by sortLinksByPriority in line 11) which is based on the link similarity value, link strength and link degree. The parameter conf ig in line 11 determines the weight of these three factors to compute the link priority.…”
Section: Clipmentioning
confidence: 99%
“…The SparkER tool by Gagliardelli et al [ 22 ] uses LSH, meta-blocking, and a block purging process to remove high-frequency blocking keys. Mestre et al [ 23 ] presented a sorted neighborhood implementation with an adaptive window size, which uses three Spark transformation steps to distribute the data and minimize data skew.…”
Section: Introductionmentioning
confidence: 99%