2022 ACM Conference on Fairness, Accountability, and Transparency 2022
DOI: 10.1145/3531146.3533175
|View full text |Cite
|
Sign up to set email alerts
|

Smallset Timelines: A Visual Representation of Data Preprocessing Decisions

Abstract: Data preprocessing is a crucial stage in the data analysis pipeline, with both technical and social aspects to consider. Yet, the attention it receives is often lacking in research practice and dissemination. We present the Smallset Timeline, a visualisation to help reflect on and communicate data preprocessing decisions. A "Smallset" is a small selection of rows from the original dataset containing instances of dataset alterations. The Timeline is comprised of Smallset snapshots representing different points … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 26 publications
(7 citation statements)
references
References 35 publications
0
7
0
Order By: Relevance
“…To bridge the expertise gap, systems should be designed to aid domain experts in understanding data science techniques in a manner that better matches their mental model of data [21]. Visualizations of data changes [17,24,32] have proved to be an effective means of supporting understanding beyond serving the function of documentation. Recent work in Explainable AI [7,9,11,18,50] has also demonstrated ways to make ML components more interpretable to non-experts through visualization and direct manipulation interactions [2,29,39].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…To bridge the expertise gap, systems should be designed to aid domain experts in understanding data science techniques in a manner that better matches their mental model of data [21]. Visualizations of data changes [17,24,32] have proved to be an effective means of supporting understanding beyond serving the function of documentation. Recent work in Explainable AI [7,9,11,18,50] has also demonstrated ways to make ML components more interpretable to non-experts through visualization and direct manipulation interactions [2,29,39].…”
Section: Related Workmentioning
confidence: 99%
“…Informed by the challenges and potential solutions identified in prior research, we explore the use of LLMs to supplement human effort in translating code into natural language and explainable visualizations. In CellSync, we build on the Smallset Timelines visualization technique [24] to select and visualize a digestible subset of rows and columns for domain experts. Further, we utilize LLMs' code summarization capabilities [1] and refine these summaries through targeted prompt engineering to tailor them towards domain experts.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Standardised description of specific preprocessing steps is challenging due to the wide variety of possible data alterations. Moreover, as observed by Lucchesi et al (2022), definitions of data preprocessing vary with audience and context from highly specific lists of tasks, to broadly encompassing boundaries within a longer data pipeline. Existing provenance tools such as (Lucchesi et al 2022;Kai Xiong et al 2022;Wang et al 2022) attempt to achieve generality by comparing dataset snapshots at various points in a preprocessing pipeline.…”
Section: Data Provenancementioning
confidence: 99%