Proceedings of the 2019 International Conference on Management of Data 2019
DOI: 10.1145/3299869.3300107
|View full text |Cite
|
Sign up to set email alerts
|

Towards Understanding Data Analysis Workflows using a Large Notebook Corpus

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 2 publications
0
4
0
Order By: Relevance
“…Despite synthesizing rich observations, interview studies were limited to dozens of participants. A few studies conducted large-scale analysis of Jupyter notebooks, but were limited to simple summary statistics [6], a single library [7], or code quality [8]. Our model enables the analysis of data science both at scale and in depth, which may validate and complement findings from previous qualitative studies.…”
Section: Studies Of Data Analysis Practicesmentioning
confidence: 78%
See 2 more Smart Citations
“…Despite synthesizing rich observations, interview studies were limited to dozens of participants. A few studies conducted large-scale analysis of Jupyter notebooks, but were limited to simple summary statistics [6], a single library [7], or code quality [8]. Our model enables the analysis of data science both at scale and in depth, which may validate and complement findings from previous qualitative studies.…”
Section: Studies Of Data Analysis Practicesmentioning
confidence: 78%
“…1 In this layer, we aim to compute p stage -a probability distribution over these six stages -from the topic distribution computed in Eq. (7). We implement this by mapping the topic distribution p topic to a probability distribution p stage over the n stages = 6 stages.…”
Section: A Cell With One Line Of Code That Does Not Create a New Vari...mentioning
confidence: 99%
See 1 more Smart Citation
“…Previous in-depth analyses of scientific code heavily rely on expert annotations, limiting the scale of these studies to the order of a hundred examples [5], [6]. Large-scale studies across thousands of examples have been limited to simple summaries such as the number or nature of imported libraries, total line counts, or the fraction of lines that are used for comments [6], [7], [8]. The software engineering community has emphasized the inadequacy of these analyses, noting that "there is a strong need to programmatically analyze Jupyter notebooks" [9], while HCI researchers have observed that studying the data science § These authors contributed equally to this work.…”
Section: Introductionmentioning
confidence: 99%