Proceedings of the 35th Annual ACM Symposium on Applied Computing 2020
DOI: 10.1145/3341105.3375776
|View full text |Cite
|
Sign up to set email alerts
|

Schema-agnostic blocking for streaming data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 10 publications
1
11
0
Order By: Relevance
“…In the ER context, we can highlight only the recent work proposed in [19], which addresses challenges related to streaming data and incremental processing. The authors of [19] propose a Spark-based blocking technique to handle heterogeneous streaming data. As previously stated, the present research is an evolution of the work in [19].…”
Section: Blocking In the Streaming Data Contextmentioning
confidence: 99%
See 4 more Smart Citations
“…In the ER context, we can highlight only the recent work proposed in [19], which addresses challenges related to streaming data and incremental processing. The authors of [19] propose a Spark-based blocking technique to handle heterogeneous streaming data. As previously stated, the present research is an evolution of the work in [19].…”
Section: Blocking In the Streaming Data Contextmentioning
confidence: 99%
“…The authors of [19] propose a Spark-based blocking technique to handle heterogeneous streaming data. As previously stated, the present research is an evolution of the work in [19]. Overall, it is possible to highlight the following improvements: (i) an efficient workflow able to address the memory consumption problems present in [19], which decrease the technique's efficiency, as well as its ability, to process large amounts of data; (ii) an attribute selection algorithm, which discards superfluous attributes to enhance efficiency and minimize memory consumption; (iii) a top-n neighborhood strategy, which maintains only the "n" most similar neighbor entities of each entity; (iv) a noise-tolerant algorithm, which allows the proposed technique to generate high-quality blocks, even in the presence of noisy data; and (v) a parallel architecture for blocking streaming data, which divides all the blocking processes among two components (sender and blocking task) to enhance efficiency.…”
Section: Blocking In the Streaming Data Contextmentioning
confidence: 99%
See 3 more Smart Citations