2024
DOI: 10.1038/s41597-023-02854-0
|View full text |Cite
|
Sign up to set email alerts
|

Open source and reproducible and inexpensive infrastructure for data challenges and education

Peter E. DeWitt,
Margaret A. Rebull,
Tellen D. Bennett

Abstract: Data sharing is necessary to maximize the actionable knowledge generated from research data. Data challenges can encourage secondary analyses of datasets. Data challenges in biomedicine often rely on advanced cloud-based computing infrastructure and expensive industry partnerships. Examples include challenges that use Google Cloud virtual machines and the Sage Bionetworks Dream Challenges platform. Such robust infrastructures can be financially prohibitive for investigators without substantial resources. Given… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 15 publications
0
1
0
Order By: Relevance
“…We noticed that Microsoft Azure GPT-4 outperformed AWS EC2 Llama 2 in terms of price, execution speed, and accuracy. However, as an open-source model, Llama 2 may have a better reproducibility 37 while GPT-4 may provide slightly different answers over time due to model updates from OpenAI 38 . Secondly, we tested several prompting strategies, and found doing error analysis using some training data, and then asking LLM to avoid summarized common errors by revising the prompt can be an effective way to improve LLM's performance.…”
Section: Discussionmentioning
confidence: 99%
“…We noticed that Microsoft Azure GPT-4 outperformed AWS EC2 Llama 2 in terms of price, execution speed, and accuracy. However, as an open-source model, Llama 2 may have a better reproducibility 37 while GPT-4 may provide slightly different answers over time due to model updates from OpenAI 38 . Secondly, we tested several prompting strategies, and found doing error analysis using some training data, and then asking LLM to avoid summarized common errors by revising the prompt can be an effective way to improve LLM's performance.…”
Section: Discussionmentioning
confidence: 99%