The conditions under which the data wrangling process is undertaken has a profound impact on the quality of the results of the data wrangling and analysis. This paper presents the results of the analysis of the socio-technical aspects of a data wrangling activity in a large, multi-site global manufacturer. This activity was technically demanding, as operational data from multiple sources and formats needed to be integrated, but also involved interaction with multiple stakeholders in different parts of the world with their own ways of collecting and structuring the data. The data had been captured previously for a different purpose. The clients were not aware that the data followed a different logic in the various sites and in some cases needed to be manually extracted and interpreted. The paper describes the data wrangling process and analyses the assumptions, goals and biases of the different stakeholders. The analysis raises questions and insights about how data can be trusted, and suggests that human intervention with data along the data wrangling process is often un-intentional, tacit and easily overlooked. It is suggested that contextual factors, such as data quality and assessment of consequences when acting/making decisions on the new data set is given higher attention during the specification of data wrangling assignments. The paper concludes with recommendations for data wrangling practitioners.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.