Damir Vandic scite author profile

Kaymak

2013

FLOPPIES: A Framework for Large-Scale Ontology Population of Product Information from Tabular Data in E-commerce Stores

Nederstigt

Aanen

Decision Support Systems

et al. 2014

Automated product taxonomy mapping in an e-commerce environment

Aanen

Expert Systems with Applications

2015

Dynamic Facet Ordering for Faceted Product Search Engines

Aanen²,

IEEE Trans. Knowl. Data Eng.

et al. 2017

A Hybrid Model Words-Driven Approach for Web Product Duplicate Detection

Bakker

2013

The detection of product duplicates is one of the challenges that Web shop aggregators are currently facing. In this paper, we focus on solving the problem of product duplicate detection on the Web. Our proposed method extends a state-of-the-art solution that uses the model words in product titles to find duplicate products. First, we employ the aforementioned algorithm in order to find matching product titles. If no matching title is found, our method continues by computing similarities between the two product descriptions. These similarities are based on the product attribute keys and on the product attribute values. Furthermore, instead of only extracting model words from the title, our method also extracts model words from the product attribute values. Based on our experimental results on real-world data gathered from two existing Web shops, we show that the proposed method, in terms of F1-measure, significantly outperforms the existing state-of-the-art title model words method and the well-known TF-IDF method.

show abstract

Multi-component similarity method for web product duplicate detection

Bezu

Borst

Rijkse

et al. 2015

Due to the growing number of Web shops, aggregating product data from the Web is growing in importance. One of the problems encountered in product aggregation is duplicate detection. In this paper, we extend and significantly improve an existing state-of-the-art product duplicate detection method. Our approach employs a novel method for combining the titles' and the attributes' similarities into a final product similarity. We use q-grams to handle partial matching of words, such as abbreviations. Where existing methods cluster products of only two Web shops, we propose a hierarchical clustering method to handle multiple Web shops. Applying our new method to a dataset of TV's from four Web shops reveals that it significantly outperforms the Hybrid Similarity Method, the Title Model Words Method, and the well-known TF-IDF method, with an F1 score of 0.475 compared to 0.287, 0.298, and 0.335, respectively.

show abstract

TaxoLearn: A Semantic Approach to Domain Taxonomy Learning

Dietz

2012