Harvard Data Science Review 2019
DOI: 10.1162/99608f92.02ffc552
|View full text |Cite
|
Sign up to set email alerts
|

Ambitious Data Science Can Be Painless

Abstract: Modern data science research, at the cutting edge, can involve massive computational experimentation; an ambitious PhD in computational fields may conduct experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops, PCs, or campus-resident resources with shared policies, are awkward or inadequate for experiments at the massive scale and varied scope that we now see in the most ambitious data science. On the other hand, modern cloud computing promises seemi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…The total number of models fully trained for this paper is tallied below: The massive computational experiments reported here were run painlessly using ClusterJob and ElastiCluster ( 19 21 ) on the Stanford Sherlock HPC (high performance computing) cluster and Google Compute Engine virtual machines.…”
Section: Setting and Methodologymentioning
confidence: 99%
“…The total number of models fully trained for this paper is tallied below: The massive computational experiments reported here were run painlessly using ClusterJob and ElastiCluster ( 19 21 ) on the Stanford Sherlock HPC (high performance computing) cluster and Google Compute Engine virtual machines.…”
Section: Setting and Methodologymentioning
confidence: 99%
“…Certain problems like sensitivity study and ensemble forecasting require a large number of independent model simulations that can run on individual compute nodes (“embarrassingly parallel”). With the large resource pool on public cloud platforms, independent jobs can be executed simultaneously and finish much faster (Monajemi et al, 2019). For example, the AWS cloud has provided 40,000 compute nodes for one industrial HPC use case (Amazon, 2019g).…”
Section: Benefits Of Cloud Computing For Earth Science Researchmentioning
confidence: 99%
“…Certain problems like sensitivity study and ensemble forecasting require a large number of independent model simulations that can run on individual compute nodes ("embarrassingly parallel"). With the large resource pool on public cloud platforms, independent jobs can be executed simultaneously and finish much faster (Monajemi et al, 2019). For example, the AWS cloud has provided 40,000 compute nodes for one industrial HPC use case (Amazon, 2019g).…”
Section: Benefits Of Cloud Computing For Earth Science Researchmentioning
confidence: 99%