Asif Sohail scite author profile

Asif Sohail

3Publications

4Citation Statements Received

36Citation Statements Given

How they've been cited

How they cite others

Affiliations

Information Technology University, University of the Punjab

Publications

Order By: Most citations

Locality sensitive blocking (LSB): A robust blocking technique for data deduplication

Sohail

Qounain

2022

Journal of Information Science

View full text Add to dashboard Cite

Data deduplication is process of discovering multiple representations of same entity in an information system. Blocking has been a benchmark technique for avoiding the pair-wise record comparisons in data deduplication. Standard blocking (SB) aims at putting the potential duplicate records in the same block on the basis of a blocking key. Afterwards, the detailed comparisons are made only among the records residing in the same block. The selection of blocking key is a tedious process that involves exponential alternatives. The outcome of SB varies considerably with a change in blocking key. To this end, we have proposed a robust blocking technique called Locality Sensitive Blocking (LSB) that does not require the selection of blocking key. The experimental results show an increase of up to 0.448 in F-score as compared with SB. Furthermore, it is found that LSB is more robust towards blocking parameters and data noise.

show abstract

A proficient cost reduction framework for de-duplication of records in data integration

Sohail

Yousaf

2016

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

BackgroundRecord de-duplication is a process of identifying the records referring to the same entity. It has a pivotal role in data mining applications, which involves the integration of multiple data sources and data cleansing. It has been a challenging task due to its computational complexity and variations in data representations across different data sources. Blocking and windowing are the commonly used methods for reducing the number of record comparisons during record de-duplication. Both blocking and windowing require tuning of a certain set of parameters, such as the choice of a particular variant of blocking or windowing, the selection of appropriate window size for different datasets etc.MethodsIn this paper, we have proposed a framework that employs blocking and windowing techniques in succession, such that figuring out the parameters is not required. We have also evaluated the impact of different configurations on dirty and massively dirty datasets. To evaluate the proposed framework, experiments are performed using Febrl (Freely Extensible Biomedical Record Linkage).ResultsThe proposed framework is comprehensively evaluated using a variety of quality and complexity parameters such as reduction ratio, precision, recall etc. It is observed that the proposed framework significantly reduces the number of record comparisons.ConclusionsThe selection of the linkage key is a critical performance factor for record linkage.Electronic supplementary materialThe online version of this article (doi:10.1186/s12911-016-0280-9) contains supplementary material, which is available to authorized users.

show abstract

Ranking the Blocking Keys for Data Deduplication in Information Systems

Sohail¹,

Jaffry²

2021

IJBIS

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Asif Sohail

Locality sensitive blocking (LSB): A robust blocking technique for data deduplication

A proficient cost reduction framework for de-duplication of records in data integration

Ranking the Blocking Keys for Data Deduplication in Information Systems

Contact Info

Product

Resources

About