Proceedings of the 30th Annual ACM Symposium on Applied Computing 2015
DOI: 10.1145/2695664.2695818
|View full text |Cite
|
Sign up to set email alerts
|

Multi-component similarity method for web product duplicate detection

Abstract: Due to the growing number of Web shops, aggregating product data from the Web is growing in importance. One of the problems encountered in product aggregation is duplicate detection. In this paper, we extend and significantly improve an existing state-of-the-art product duplicate detection method. Our approach employs a novel method for combining the titles' and the attributes' similarities into a final product similarity. We use q-grams to handle partial matching of words, such as abbreviations. Where existin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(21 citation statements)
references
References 14 publications
(23 reference statements)
0
21
0
Order By: Relevance
“…Similar to product classification, typically product linking will make use of product names (e.g., Kannan et al (2011); Gopalakrishnan et al (2012);Vandic et al (2012);van Bezu et al (2015); Shah et al (2018);Tracz et al (2020); Li et al (2020)) and descriptions (e.g., Petrovski et al (2014); Ristoski et al (2018); Li et al (2020)). The difference however, is that the task also makes use of a diverse range of structured product attributes (e.g., van Bezu et al (2015); Shah et al (2018); Petrovski and Bizer (2020); Li et al (2020)), often defined as 'key-value' pairs such as those that can be extracted from product specifications (e.g., product ID, model, brand, manufacturer). Intuitively, offers that have the similar sets of key-value pairs are more likely to match.…”
Section: Product Linkingmentioning
confidence: 99%
See 2 more Smart Citations
“…Similar to product classification, typically product linking will make use of product names (e.g., Kannan et al (2011); Gopalakrishnan et al (2012);Vandic et al (2012);van Bezu et al (2015); Shah et al (2018);Tracz et al (2020); Li et al (2020)) and descriptions (e.g., Petrovski et al (2014); Ristoski et al (2018); Li et al (2020)). The difference however, is that the task also makes use of a diverse range of structured product attributes (e.g., van Bezu et al (2015); Shah et al (2018); Petrovski and Bizer (2020); Li et al (2020)), often defined as 'key-value' pairs such as those that can be extracted from product specifications (e.g., product ID, model, brand, manufacturer). Intuitively, offers that have the similar sets of key-value pairs are more likely to match.…”
Section: Product Linkingmentioning
confidence: 99%
“…Feature representation. Again, similar to product classification, broadly speaking, transforming textual metadata into feature representations is typically based on BoW (e.g., Vandic et al (2012);van Bezu et al (2015)), pre-trained word embeddings or language models (e.g., Ristoski et al (2018); Shah et al (2018); Li et al (2020); Peeters et al (2020a);Tracz et al (2020)), or learning word embeddings on-the-spot from the downstream task datasets (e.g., Shah et al (2018)). However, depending on the types of metadata, different methods may be adopted and then combined (Köpcke et al (2012)).…”
Section: Product Linkingmentioning
confidence: 99%
See 1 more Smart Citation
“…The approach is based on several string similarity functions for product matching. The approach is extended by using a hybrid similarity method and hierarchical clustering for matching products from multiple e-shops [1].…”
Section: Product Matchingmentioning
confidence: 99%
“…The approach is first extended in [1], using a hybrid similarity method. Later, the method is extended in [2], where hierarchical clustering is used for matching products from multiple web shops, using the same hybrid similarity method.…”
Section: Related Workmentioning
confidence: 99%