Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2009
DOI: 10.1145/1557019.1557144
|View full text |Cite
|
Sign up to set email alerts
|

Address standardization with latent semantic association

Abstract: Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates, millions of free-text addresses need to be converted to a standard format for data integration, de-duplication and householding. Existing commercial tools usually employ lots of hand-craft, domain-specific rules and reference data dictionary of cities, states etc. These rules work better for the region they are designed. However, ru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…For instance in (Kothari et al, 2010) a new address standardization technique for countries such as India is shown which is able to deal with substantial variants in address structures within a country. Also for addresses, (Guo et al, 2009) proposes latent semantic association instead of the traditional rule and dictionary-based approaches for data standardization. The data standardization step is typically applied before matching and data consolidation algorithms are applied to match records with the intent to find duplicates.…”
Section: Dq Methods For Improvementmentioning
confidence: 99%
“…For instance in (Kothari et al, 2010) a new address standardization technique for countries such as India is shown which is able to deal with substantial variants in address structures within a country. Also for addresses, (Guo et al, 2009) proposes latent semantic association instead of the traditional rule and dictionary-based approaches for data standardization. The data standardization step is typically applied before matching and data consolidation algorithms are applied to match records with the intent to find duplicates.…”
Section: Dq Methods For Improvementmentioning
confidence: 99%
“…-Company. We also crawl addresses on two food review websites 2 3 and one company information query website 4 . This database contains 10k company addresses.…”
Section: Datasets and Metricsmentioning
confidence: 99%
“…The process of translating manually written addresses into a certain digital format is known as address standardization. There are also some researches on Address Standardization [4,6,8]. A method based on trie-tree and finite state machine is proposed in [10] which focuses on the problem of inaccurate word segmentation.…”
Section: Table 3 An Example Of Web Contextsmentioning
confidence: 99%
“…The aim of this model is to extract hidden (unknown) information from a string of visible parameters. Particularly novel is the work of Guo et al (2009), which analyzes postal addresses using a model of Latent Semantic Association (LaSA). LaSA model is built to minimize the human efforts and the size of the control data.…”
Section: Introductionmentioning
confidence: 99%
“…Some tests to solve the problem of normalization of addresses were done years ago, but the greatest difficulty was the necessary computing power, not very developed at that time (Fernández et al, 1993). Current processors overcome this difficulty and, besides, new studies emerge every day analyzing the feasibility of different algorithms for data management (Navarro et al, 2003;Patman and Thompson, 2003;Christen and Belacic, 2005;Guo et al, 2009). Although these studies are not applied to Bibliometrics, they employ different techniques that can be used for present and future improvements.…”
Section: Introductionmentioning
confidence: 99%