The current information age has increasingly required organizations to become data-driven. However, analyzing and managing raw data is still a challenging part of the data mining process. Even though we can find interview studies proposing design implications or recommendations for future visualization solutions in the data mining scope, they cover the entire workflow and do not fully focus on the challenges during the preprocessing phase and on how visualization can support it. Moreover, they do not organize a final list of insights consolidating the findings of other related studies. Hence, to better understand the current practice of enterprise professionals in data mining workflows, in particular during the preprocessing phase, and how visualization supports this process, we conducted semi-structured interviews with thirteen data analysts. The discussion about the challenges and opportunities based on the responses of the interviewees resulted in a list of ten insights. This list was compared with the closest related works, improving the reliability of our findings and providing background, as a consolidated set of requirements, for future visualization research papers applied to visual data exploration in data mining. Furthermore, we provide greater details on the profile of the data analysts, the main challenges they face, and the opportunities that arise while they are engaged in data mining projects in diverse organizational areas.
To accommodate the demands of a data-driven society, we have expanded our ability to collect and store data, develop sophisticated algorithms, and generate elaborated visual representations of the data analysis process outcomes. However, data preprocessing, as the activity of transforming the raw data into an appropriate format for subsequent analysis, is still a challenging part of this process. Although we can find studies that address the use of visualization techniques to support the activities in the scope of preprocessing, the current Visual Analytics processes do not consider preprocessing an equally important phase in their processes. Hence, with this paper, we aim to contribute to the discussion of how we can incorporate the preprocessing as a prominent phase in the Visual Analytics process and promote better alternatives to assist the data analysts during the preprocessing activities. To achieve that, we are introducing the Preprocessing Profiling Approach for Visual Analytics (PrAVA), a conceptual Visual Analytics process that includes Preprocessing Profiling as a new phase. It also contemplates a set of guidelines to be considered by new solutions adopting PrAVA. Moreover, we analyze its applicability through use case scenarios that show resourceful methods for data understanding and evaluation of the preprocessing impacts. As a final contribution, we indicate a list of research opportunities in the scope of preprocessing combined with visualization and Visual Analytics to stimulate a shift to visual preprocessing.
Analyzing and managing raw data are still a challenging part of the data analysis process, mainly regarding data preprocessing. Although we can find studies proposing design implications or recommendations for visualization solutions in the data analysis scope, they do not focus on challenges during the preprocessing phase. Likewise, the current Visual Analytics processes do not consider preprocessing an equally important stage in their process. Thus, with this study, we aim to contribute to the discussion of how we can use and combine methods of visualization and data mining to assist data analysts during the preprocessing activities. To achieve that, we introduce the Preprocessing Profiling Model for Visual Analytics, which contemplates a set of features to inspire the implementation of new solutions. In turn, these features were designed considering a list of insights we obtained during an interview study with thirteen data analysts. Our contributions can be summarized as offering resources to promote a shift to a visual preprocessing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.