2013
DOI: 10.1007/978-3-642-38709-8_10
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Model Words-Driven Approach for Web Product Duplicate Detection

Abstract: The detection of product duplicates is one of the challenges that Web shop aggregators are currently facing. In this paper, we focus on solving the problem of product duplicate detection on the Web. Our proposed method extends a state-of-the-art solution that uses the model words in product titles to find duplicate products. First, we employ the aforementioned algorithm in order to find matching product titles. If no matching title is found, our method continues by computing similarities between the two produc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
1

Year Published

2014
2014
2016
2016

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 9 publications
0
12
1
Order By: Relevance
“…When duplicates are indeed detected, the algorithm directly proceeds to the next product combination. However, in the evaluation of the algorithms [3], it is shown that TMWM has only a precision of 0.556, whereas HSM has a precision of 0.741. TMWM thus declares a false duplicate in 44% of the cases, and HSM further improves this to 26% for the set of products declared non-duplicates by TMWM.…”
Section: Related Workmentioning
confidence: 96%
See 3 more Smart Citations
“…When duplicates are indeed detected, the algorithm directly proceeds to the next product combination. However, in the evaluation of the algorithms [3], it is shown that TMWM has only a precision of 0.556, whereas HSM has a precision of 0.741. TMWM thus declares a false duplicate in 44% of the cases, and HSM further improves this to 26% for the set of products declared non-duplicates by TMWM.…”
Section: Related Workmentioning
confidence: 96%
“…The Hybrid Similarity Method (HSM) in [3] builds on TMWM of [15] by including additional information given by the product attributes. As a first step, TMWM is used to determine if the two products under consideration are duplicates based on the product title.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, in real applications the problem of products matching is highly imbalanced. The approach is first extended in [1], using a hybrid similarity method. Later, the method is extended in [2], where hierarchical clustering is used for matching products from multiple web shops, using the same hybrid similarity method.…”
Section: Related Workmentioning
confidence: 99%