2021
DOI: 10.1101/2021.04.22.436044
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)

Abstract: The traditional model of genomic data analysis - downloading data from centralized warehouses for analysis with local computing resources - is increasingly unsustainable. Not only are transfers slow and cost prohibitive, but this approach also leads to redundant and siloed compute infrastructure that makes it difficult to ensure security and compliance of protected data. The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) inverts this model, provi… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
23
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 29 publications
(26 citation statements)
references
References 57 publications
1
23
0
Order By: Relevance
“…To investigate how the T2T-CHM13 assembly impacts short-read variant calling, we realigned and reprocessed all 3,202 samples from the recently expanded 1KGP cohort [28] using the NHGRI AnVIL Platform [39]. In this collection, each sample is sequenced to at least 30x coverage using paired-end Illumina sequencing, with samples from 26 diverse populations across 5 major continental superpopulations ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…To investigate how the T2T-CHM13 assembly impacts short-read variant calling, we realigned and reprocessed all 3,202 samples from the recently expanded 1KGP cohort [28] using the NHGRI AnVIL Platform [39]. In this collection, each sample is sequenced to at least 30x coverage using paired-end Illumina sequencing, with samples from 26 diverse populations across 5 major continental superpopulations ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…GA4GH Driver Projects and other partners are beginning to implement cloud-based workflows built on GA4GH standards that allow scientists to share, access, and interrogate data stored at disparate sites around the globe. Some concrete examples of this access pattern include (1) the Data Coordination Platform of the Human Cell Atlas, an internationally federated compute environment for analyzing single-cell data; (2) Genomics England’s secure Research Environment for approved investigators to access the 100,000 Genomes Project dataset; (3) the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) 30 and the Gen3 Data Commons, which provide cloud-based spaces for scientists to work with large-scale genomic and genomic-related datasets and shared tools; and (4) H3ABioNet, a bioinformatics platform that serves data from the Human Heredity and Health in Africa (H3Africa) network to researchers across the continent and provides containerized workflows for analysis of the data.…”
Section: Ga4gh Organizationmentioning
confidence: 99%
“…The contents of the resource and its user interfaces are illustrated in Figure 1a. In addition to SciServer-Compute, we have made recount3 available from AnVIL (Schatz et al, 2021), the genomic data science cloud platform from NHGRI. In summary, we created an RNA-seq processing framework and resulting resource to facilitate reanalysis of hundreds of thousands of RNA-seq samples.…”
Section: The Recount3 Resourcementioning
confidence: 99%