2022
DOI: 10.14569/ijacsa.2022.0131240
|View full text |Cite
|
Sign up to set email alerts
|

Tracking The Sensitivity of The Learning Models Toward Exact and Near Duplicates

Abstract: Most real-world datasets contaminated by quality issues have a severe effect on the analysis results. Duplication is one of the main quality issues that hinder these results. Different studies have tackled the duplication issue from different perspectives. However, revealing the sensitivity of supervised and unsupervised learning models under the existence of different types of duplicates, deterministic and probabilistic, is not broadly addressed. Furthermore, a simple metric is used to estimate the ratio of b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 28 publications
(34 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?