2022
DOI: 10.1101/2022.03.20.485034
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A genome-wide mutational constraint map quantified from variation in 76,156 human genomes

Abstract: The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders, but attempts to assess constraint for non-protein-coding regions have proven more difficult. Here we aggregate, process, and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD), the largest public open-access human genome reference dataset, and use this dataset to build a mutational constraint map for th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

6
257
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 292 publications
(358 citation statements)
references
References 97 publications
6
257
0
Order By: Relevance
“…To demonstrate the added benefit of jointly calling these two datasets, we have compiled metrics that compare our harmonized dataset with each individual dataset comprising it 1,7 , the previous phase 3 1kGP dataset sequenced to lower coverage 4 , and the widely used gnomAD dataset 17 . This jointly called HGDP+1kGP dataset contains 159,795,273 SNVs and indels that pass QC, whereas phase 3 1kGP has 73,257,633, high-coverage WGS of 1kGP (referred to here as NYGC 1kGP based on where they were sequenced) has 119,895,186, and high-coverage WGS of HGDP (referred to here as Bergstrom HGDP based on the publication) has 75,310,370.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…To demonstrate the added benefit of jointly calling these two datasets, we have compiled metrics that compare our harmonized dataset with each individual dataset comprising it 1,7 , the previous phase 3 1kGP dataset sequenced to lower coverage 4 , and the widely used gnomAD dataset 17 . This jointly called HGDP+1kGP dataset contains 159,795,273 SNVs and indels that pass QC, whereas phase 3 1kGP has 73,257,633, high-coverage WGS of 1kGP (referred to here as NYGC 1kGP based on where they were sequenced) has 119,895,186, and high-coverage WGS of HGDP (referred to here as Bergstrom HGDP based on the publication) has 75,310,370.…”
Section: Resultsmentioning
confidence: 99%
“…The harmonized variant processing, quality control, and improved coverage of variants across the allele frequency spectrum in this jointly called resource will facilitate the improved study of diverse populations. Due to our rapid release of the data pre-publication, the callset formally released here has already been used as a resource of global diversity in the Genome Aggregation Database (gnomAD) 17 , the Pan-UK Biobank Project 24 , the Global Biobank Meta-analysis Initiative (GBMI) 25 , and the Covid-19 Host Genetics Initiative 26 . A primary use of this data is as a global reference for principal components analysis (PCA)--SNV loadings are freely shared so that user cohorts can be aligned to the same PC space as this optimized reference panel.…”
Section: Discussionmentioning
confidence: 99%
“…Our data support both, but outside of recurrent Robertsonian translocations and our expectation that non-crossover recombination is significantly more common (~10:1) than crossover (Gay, Myers, and McVean 2007; Cole, Keeney, and Jasin 2010), we lack distinguishing evidence for either. To estimate relative rates of each type of event, we can use linkage disequilibrium patterns (Gay, Myers, and McVean 2007) to study the PHRs in large genomic cohorts (Taliun et al 2021; Chen et al 2022), which will require realigning cohort short read data to T2T-CHM13 or the HPRC pangenome. Future improvements to assembly of the SAACs and the planned increase in the number of individuals included in the HPRC should allow for confident estimates of the relative rates of recombination types.…”
Section: Discussionmentioning
confidence: 99%
“…1A). No variation was found at this position in more than 76 000 genomes from healthy individuals of diverse ancestries in the gnomAD database (v3.1.2; https://gnomad.broadinstitute.org), indicating that Pro401Leu is not a common variant 26,27 . Pro401 is located at the bottom of the C2B domain (Fig.…”
Section: Resultsmentioning
confidence: 99%