Proceedings of the 19th International Conference on Mining Software Repositories 2022
DOI: 10.1145/3524842.3528447
|View full text |Cite
|
Sign up to set email alerts
|

A large-scale comparison of Python code in Jupyter notebooks and scripts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…Code development inside virtual notebooks often differs substantially from best practices in software engineering [23,33]. As noted above, developers frequently use notebooks for exploratory calculations instead of building well structured workflows.…”
Section: Literate Programming and Computational Continuitymentioning
confidence: 99%
“…Code development inside virtual notebooks often differs substantially from best practices in software engineering [23,33]. As noted above, developers frequently use notebooks for exploratory calculations instead of building well structured workflows.…”
Section: Literate Programming and Computational Continuitymentioning
confidence: 99%
“…We applied this approach to find the optimal value for Jupyter Notebooks for two new IDEs -Datalore [8] and DataSpell [9]. Research shows that code clones are frequent in notebooks [11] and that the code in regular Python and notebooks is different [12]. We sampled 10,000 Python scripts and 10,000 Jupyter notebooks with permissive licences and 10+ stars on GitHub from our previous work [12].…”
Section: Approach Evaluation and Future Workmentioning
confidence: 99%
“…Research shows that code clones are frequent in notebooks [11] and that the code in regular Python and notebooks is different [12]. We sampled 10,000 Python scripts and 10,000 Jupyter notebooks with permissive licences and 10+ stars on GitHub from our previous work [12]. Next, we applied our algorithm, and calculated that the default PyCharm's threshold of 45 CST elements corresponds to the 95th percentile of the distribution, meaning that 5% of all potential Python clones are underlined.…”
Section: Approach Evaluation and Future Workmentioning
confidence: 99%