A large-scale comparison of Python code in Jupyter notebooks and scripts

Grotov, Konstantin; Titov, Sergey; Sotnikov, V. I.; Golubev, Yaroslav; Bryksin, Timofey

doi:10.1145/3524842.3528447

Cited by 16 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Code development inside virtual notebooks often differs substantially from best practices in software engineering [23,33]. As noted above, developers frequently use notebooks for exploratory calculations instead of building well structured workflows.…”

Section: Literate Programming and Computational Continuitymentioning

confidence: 99%

Exploratory data science on supercomputers for quantum mechanical calculations

Dawson,

Beal,

Ratcliff

et al. 2024

Electron. Struct.

View full text Add to dashboard Cite

Literate programming — the bringing together of program code and natural language narratives — has become a ubiquitous approach in the realm of data science. This methodology is appealing as well for the domain of Density Functional Theory (DFT) calculations, particularly for interactively developing new methodologies and workflows. However, effective use of literate programming is hampered by old programming paradigms and the difficulties associated with using High Performance Computing (HPC) resources. Here we present two Python libraries that aim to remove these hurdles. First, we describe the PyBigDFT library, which can be used to setup materials or molecular systems and provides high-level access to the wavelet based BigDFT code. We then present the related remotemanager library, which is able to serialize and execute arbitrary Python functions on remote supercomputers. We show how together these libraries enable transparent access to HPC based DFT calculations and can serve as building blocks for rapid prototyping and data exploration.

show abstract

Section: Literate Programming and Computational Continuitymentioning

confidence: 99%

Exploratory data science on supercomputers for quantum mechanical calculations

Dawson,

Beal,

Ratcliff

et al. 2024

Electron. Struct.

View full text Add to dashboard Cite

show abstract

“…We applied this approach to find the optimal value for Jupyter Notebooks for two new IDEs -Datalore [8] and DataSpell [9]. Research shows that code clones are frequent in notebooks [11] and that the code in regular Python and notebooks is different [12]. We sampled 10,000 Python scripts and 10,000 Jupyter notebooks with permissive licences and 10+ stars on GitHub from our previous work [12].…”

Section: Approach Evaluation and Future Workmentioning

confidence: 99%

“…Research shows that code clones are frequent in notebooks [11] and that the code in regular Python and notebooks is different [12]. We sampled 10,000 Python scripts and 10,000 Jupyter notebooks with permissive licences and 10+ stars on GitHub from our previous work [12]. Next, we applied our algorithm, and calculated that the default PyCharm's threshold of 45 CST elements corresponds to the 95th percentile of the distribution, meaning that 5% of all potential Python clones are underlined.…”

Section: Approach Evaluation and Future Workmentioning

confidence: 99%