2019
DOI: 10.1016/j.tig.2018.12.006
|View full text |Cite
|
Sign up to set email alerts
|

Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data

Abstract: Data commons collate data with cloud computing infrastructure and commonly used software services, tools and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize and share large scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import and analyze the data in a data common… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 47 publications
(34 citation statements)
references
References 61 publications
0
32
0
2
Order By: Relevance
“…Author Manuscript Published OnlineFirst on February 12, 2020; DOI: 10.1158/1055-9965.EPI-19-0842 infrastructure meets the CRDC's requirements (22,23). This demonstrates that the DW infrastructure SDSC offers is consistent with the research community's current standards and reaffirms that DWs have the flexibility and scalability to support CEC research.…”
Section: Discussionmentioning
confidence: 59%
“…Author Manuscript Published OnlineFirst on February 12, 2020; DOI: 10.1158/1055-9965.EPI-19-0842 infrastructure meets the CRDC's requirements (22,23). This demonstrates that the DW infrastructure SDSC offers is consistent with the research community's current standards and reaffirms that DWs have the flexibility and scalability to support CEC research.…”
Section: Discussionmentioning
confidence: 59%
“…Recent progress has brought substantial transformations in how the petabytes of genomic data being generated each year are assimilated and analysed, including the emergence of cloud-based and federated approaches. Effective and efficient management of increasingly complex genomic datasets requires addressing challenges with these emerging approaches as well as innovations in the use of hardware, algorithms, software, standards, and platforms 40 . Current barriers include the lack of interoperable genomic data resources (which limits downstream access, integration, and analyses) and the absence of controlled and consistently adopted data and metadata vocabularies and ontologies 41,42 .…”
Section: Boxmentioning
confidence: 99%
“…If participants' or collaborators' institutions are equipped with large in-house high-performance computing resources, they will likely have more direct access and practical assistance in their genome project. Otherwise, cloud-based computing is a potential solution that has been widely emphasized in previous works including easy-to-follow steps [88][89][90]. While cloud computing provides flexibility, competitive pricing, and continually updated hardware and software, it still requires assistance from information technology (IT) specialists to set up suitable cloud-based software.…”
Section: Step 6: Check the Computational Resources And Requirementsmentioning
confidence: 99%