2023
DOI: 10.1101/2023.05.09.539329
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A deep catalog of protein-coding variation in 985,830 individuals

Kathie Y. Sun,
Xiaodong Bai,
Siying Chen
et al.

Abstract: Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 100 publications
0
4
0
Order By: Relevance
“…4b). For comparison, observed participants of the Regeneron Genetics Center Million Exome dataset(30) carry an average of 1.6 high-confidence ClinVar pathogenic variants. This suggests that our definition of pathogenicity is more heavily selected against in the population than ClinVar Pathogenic variants.…”
Section: Resultsmentioning
confidence: 99%
“…4b). For comparison, observed participants of the Regeneron Genetics Center Million Exome dataset(30) carry an average of 1.6 high-confidence ClinVar pathogenic variants. This suggests that our definition of pathogenicity is more heavily selected against in the population than ClinVar Pathogenic variants.…”
Section: Resultsmentioning
confidence: 99%
“…This is especially true for noncoding variation as in silico tools capable of evaluating their functional impact remain in a nascent stage. Furthermore, as population genomic databases which do not contain individual-level phenotypic data increase in size by ingesting data from large-scale programs like UK Biobank, All of US, 100,000 Genomes Project, and Regeneron Genomics Center which do not screen out individuals with severe neurodevelopmental disorders, there may be new challenges in the interpretation of ultra-rare genetic variants in a Mendelian context [ 129 , 130 , 136 , 137 ].…”
Section: Functional Genomics Of Mitochondrial Neurodevelopmental Diso...mentioning
confidence: 99%
“…TSC2 missense variants were curated from a variety of sources including: (1) gnomAD v4, (2) the Regeneron exome server, (3) ClinVar, (4) and the TSC2 Leiden Open Variation Database (LOVD). [16][17][18][19] We subset this data into a truth set of variants present in ClinVar, containing 246 benign or likely benign (BLB) variants and 130 PLP variants. Variants overlapping this ClinVar truth set were discarded from the gnomAD, Regeneron, and LOVD datasets.…”
Section: Supervised Learningmentioning
confidence: 99%