Iterative record linkage for cleaning and integration

Bhattacharya, Indrajit; Getoor, Lise

doi:10.1145/1008694.1008697

Cited by 143 publications

(122 citation statements)

References 20 publications

Supporting

Mentioning

120

Contrasting

Order By: Relevance

“…Iterative approaches [8,14] identified the need to transitively compare merged records to discover more matches, for merges that are simple groupings of the data in merged records. Our approach allows richer, "custom" merges.…”

Section: Related Workmentioning

confidence: 99%

Swoosh: a generic approach to entity resolution

et al. 2008

View full text Add to dashboard Cite

We consider the Entity Resolution (ER) problem (also known as deduplication, or merge-purge), in which records determined to represent the same real-world entity are successively located and merged. We formalize the generic ER problem, treating the functions for comparing and merging records as black-boxes, which permits expressive and extensible ER solutions. We identify four important properties that, if satisfied by the match and merge functions, enable much more efficient ER algorithms. We develop three efficient ER algorithms: G-Swoosh for the case where the four properties do not hold, and R-Swoosh and F-Swoosh that exploit the 4 properties. F-Swoosh in addition assumes knowledge of the "features" (e.g., attributes) used by the match function. We experimentally evaluate the algorithms using comparison shopping data from Yahoo! Shopping and hotel information data from Yahoo! Travel. We also show that R-Swoosh (and F-Swoosh) can be used even when the four match and merge properties do not hold, if an "approximate" result is acceptable.

show abstract

Section: Related Workmentioning

confidence: 99%

Swoosh: a generic approach to entity resolution

et al. 2008

View full text Add to dashboard Cite

show abstract

“…The static active learning and weakly labeled non duplicates methods were used for training data (Singla and Domingos, 2005). An algorithm for discriminative learning of MLN parameters by combining the voted perceptron with a weighted satisfiability solver was proposed by Bhattacharya and Getoor (2004). An iterative deduplication algorithm was proposed by Bilenko and Mooney (2003), which is used to detect and remove duplicate entity from heterogeneous data sources.…”

Section: Related Workmentioning

confidence: 99%

Data fusion in data federation using modified discriminative Markov logic networks

Hema

Guptha

2016

Int. j. adv. appl. sci

View full text Add to dashboard Cite

The quality integrated data is crucial for data mining process. The existing approaches are used trust your friends and cry with wolves principle to resolve the data conflicts. These principles are taking the value of a preferred source and taking the most frequent value. However, it is a challenge for data integration to choose the most trustworthy data source and it is arbitrary to trust only certain source. To mitigate above issues, Data Fusion in Data Federation using Modified Discriminative Markov Logic Networks (DF-MDMLN) approach is proposed. Data fusion is to resolve the data conflicts among the data from different heterogeneous databases by utilizing multiangle features and knowledge of discriminative Markov Logic Network (MLN). The data fusion is used to improve the precision and recall of the end users' data set. E-shopping for computer peripherals application is considered for experimentation to analyze the performance of DF-MDMLN approach. Experiments on E-shopping data sets show the effectiveness of DF-MDMLN approach. It is observed that the precision and recall of data fusion has been improved by 40% and 27% respectively.

show abstract

“…Meanwhile, on criminal [131], epidemiology [130], financial [124], and linked data networks [141] [125] [128], node-related techniques have been used. As for link-related approaches, they also examined the data management [133], digital libraries [137], and lexical networks [134]. Besides the biological [143] [144] [147] and social networks [145] [146], graph-related tasks have been applied also on software behavior networks [142].…”

Section: Development and Tasksmentioning

confidence: 99%

Privacy in Online Social Networks

Raad

Chbeir

2013

Lecture Notes in Social Networks

View full text Add to dashboard Cite

Abstract. Online social networks have become an important part of the online activities on the web and one of the most influencing media. Unconstrained by physical spaces, online social networks offer to web users new interesting means to communicate, interact, and socialize. While these networks make frequent data sharing and inter-user communications instantly possible, privacy-related issues are their obvious much discussed immediate consequences. Although the notion of privacy may take different forms, the ultimate challenge is how to prevent privacy invasion when much personal information is available. In this context, we address privacy-related issues by resorting to social network analysis and link mining techniques. We first describe the fundamental of social networks, their common representations, and the main motivations associated with their use. Afterwards, we particularly show how privacy attacks can build on social network analysis and link mining techniques to reveal user-sensitive information. The chapter concludes with a discussion of some open challenges to address in future privacy-related works.

show abstract

Iterative record linkage for cleaning and integration

Cited by 143 publications

References 20 publications

Swoosh: a generic approach to entity resolution

Swoosh: a generic approach to entity resolution

Data fusion in data federation using modified discriminative Markov logic networks

Privacy in Online Social Networks

Contact Info

Product

Resources

About