To assist business intelligence companies dealing with data preparation problems, different approaches have been developed to handle the dirty data. However, these data cleansing approaches do not have real-time monitoring capabilities. Therefore, business intelligence companies and their clients are not able to predict the final outcome before running all business process. This yields an extra cost for the company if the data are highly corrupted. Therefore, to reduce cost for these types of businesses, the authors design a framework that monitors the quality attributes during the data cleansing process. Moreover, the system provides feedback to the user and allows the user to restructure the workflow based on quality attributes. The main concept of the framework is based on client-server architecture that uses multithreading to allow real-time monitoring of the process. A child thread is dedicated to run and another is dedicated to monitor the processes and give feedback to the user. The real-time monitoring system not only displays the cleansing process done on the data set, but also estimates the risk propagation probabilities in the data cleansing process. De-duplication elimination, address normalization, spelling correction for personal names, and non-ASCII character removal techniques are employed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.