Awsan Thabet scite author profile

Awsan Thabet

2Publications

6Citation Statements Received

41Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Duplicates Detection Within Incomplete Data Sets Using Blocking and Dynamic Sorting Key Methods

Ali¹,

Emran²,

Asmai³

et al. 2018

ijacsa

View full text Add to dashboard Cite

In database records duplicate detection, blocking method is commonly used to reduce the number of comparisons between the candidate record pairs. The main procedure in this method requires selecting attributes that will be used as sorting keys. Selection accuracy is essential in clustering candidates records that are likely matched in the same block. Nevertheless, the presence of missing values affects the creation of sorting keys and this is particularly undesirable if it involves the attributes that are used as the sorting keys. This is because, consequently, records that are supposed to be included in the duplicate detection procedure will be excluded from being examined. Thus, in this paper, we propose a method that can deal with the impact of missing values by using a dynamic sorting key. Dynamic sorting is an extension of blocking method that essentially works on two functions namely uniqueness calculation function (UF) (to choose unique attributes) and completeness function (CF) (to search for missing values). We experimented a particular blocking method called as sorted neighborhood with a dynamic sorting key on a restaurant data set (that consists of duplicate records) obtained from earlier research in order to evaluate the method's accuracy and speed. Hypothetical missing values were applied to testing data set used in the experiment, where we compare the results of duplicate detection with (and without) dynamic sorting key. The result shows that, even though missing values are present, there is a promising improvement in the partitioning of duplicate records in the same block.

show abstract

Enhanced Robust Association Rules (ERAR)Method for Missing Values Imputation

Thabet¹

2020

IJATCSE

View full text Add to dashboard Cite

Missing values or incomplete data is a common problem that occurs in many applications. In most cases, recovering missing values from data sets is necessary to avoid bias conclusions made by omitting missing values. Missing values recovery (that is also known as missing values imputation) is an important research subject in the field of statistics and data mining. In this paper, we present the Enhanced Robust Association Rules (ERAR)method to extract useful association rules and avoid redundant rules. We show the enhancement made on ERAR to improve the imputation performed by the original Robust Association Rules (RAR). ERAR is designed in selecting the frequent items in datasets that are only related to missing values. Therefore, unnecessary frequent items can be ignored in generating the association rules. The result of the experiment shows that ERAR offers better performance in terms of the time taken for the imputation process and the amount of memory used to complete the imputation. In particular, ERAR behaves better in a monotone pattern of missing values than the arbitrary pattern. In terms of imputation accuracy, we found that both ERAR and RAR exhibit a decreasing rate of accuracy as the amount of missing values increases for data of arbitrary pattern, but this is not the case of data of the monotone pattern. With the findings, ERAR contributes to improving how one can deal with incomplete data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Awsan Thabet

Duplicates Detection Within Incomplete Data Sets Using Blocking and Dynamic Sorting Key Methods

Enhanced Robust Association Rules (ERAR)Method for Missing Values Imputation

Contact Info

Product

Resources

About