Proceedings of the 21st ACM International Conference on Information and Knowledge Management 2012
DOI: 10.1145/2396761.2396839
|View full text |Cite
|
Sign up to set email alerts
|

Matching product titles using web-based enrichment

Abstract: Matching product titles from different data feeds that refer to the same underlying product entity is a key problem in online shopping. This matching problem is challenging because titles across the feeds have diverse representations with some missing important keywords like brand and others containing extraneous keywords related to product specifications. In this paper, we propose a novel unsupervised matching algorithm that leverages web search engines to (1) enrich product titles by adding important missing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
38
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 35 publications
(40 citation statements)
references
References 27 publications
0
38
0
Order By: Relevance
“…However, these basic similarity functions cannot consistently perform well across the remaining three categories. Variations of the basic similarity approach, such as "Term Frequency × Inverse Document Frequency" (TF-IDF) [22] that has a global scope and stable ranking mechanism, can handle Category 3 fairly well, but fail in Category 4 for the reasons presented in [16].…”
Section: Unmatched Titles With High Degree Of Token Overlapmentioning
confidence: 99%
See 4 more Smart Citations
“…However, these basic similarity functions cannot consistently perform well across the remaining three categories. Variations of the basic similarity approach, such as "Term Frequency × Inverse Document Frequency" (TF-IDF) [22] that has a global scope and stable ranking mechanism, can handle Category 3 fairly well, but fail in Category 4 for the reasons presented in [16].…”
Section: Unmatched Titles With High Degree Of Token Overlapmentioning
confidence: 99%
“…There have been some recent works such as [8] and [16] (EN+IMP) that try to overcome the above limitations of these basic approaches. Although these fare better than most standard approaches -see Category 4, their efficacy is limited.…”
Section: Unmatched Titles With High Degree Of Token Overlapmentioning
confidence: 99%
See 3 more Smart Citations