2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2021
DOI: 10.1109/ase51524.2021.9678520
|View full text |Cite
|
Sign up to set email alerts
|

Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 39 publications
0
9
0
Order By: Relevance
“…For example, the release of Jupyter Lab 3.0 introduced a visual debugger that can be used to step through code or to check the value of a variable [20]. This need for more tool support is suggested by other studies as well [5,27].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, the release of Jupyter Lab 3.0 introduced a visual debugger that can be used to step through code or to check the value of a variable [20]. This need for more tool support is suggested by other studies as well [5,27].…”
Section: Discussionmentioning
confidence: 99%
“…These studies agree that notebook code is frequently low-quality and error-prone. Closely related to our work is that from Yang et al [27]. They report on a tool to support bug detection in Kaggle notebooks, which they characterized as 'data wrangling code'.…”
Section: Introductionmentioning
confidence: 88%
See 1 more Smart Citation
“…According to a recent study [63], data handling code tends to be error-prone and often contains subtle issues. We claim that such poor data handling practices will likely cause data smells.…”
Section: Data Handlingmentioning
confidence: 99%
“…As the conversion input represents just a date, developers may be unaware that without further declaration, a time suffix (i.e., "00:00:00") is added. Such issues often go undetected because of the common programmatic style of method chaining when processing data [63]. By sequencing multiple data operations (i.e., method chaining), developers cannot see the intermediate processing results and thus identify problems introduced in the data [61].…”
Section: Data Handlingmentioning
confidence: 99%