Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American
Background Copy number variations (CNVs) account for a substantial proportion of inter-individual genomic variation. However, a majority of genomic variation studies have focused on single-nucleotide variations (SNVs), with limited genome-wide analysis of CNVs in large cohorts, especially in populations that are under-represented in genetic studies including people of African descent. Methods We carried out a genome-wide copy number analysis in > 3400 healthy Bantu Africans from Tanzania. Signal intensity data from high density (> 2.5 million probes) genotyping arrays were used for CNV calling with three algorithms including PennCNV, DNAcopy and VanillaICE. Stringent quality metrics and filtering criteria were applied to obtain high confidence CNVs. Results We identified over 400,000 CNVs larger than 1 kilobase (kb), for an average of 120 CNVs (SE = 2.57) per individual. We detected 866 large CNVs (≥ 300 kb), some of which overlapped genomic regions previously associated with multiple congenital anomaly syndromes, including Prader-Willi/Angelman syndrome (Type1) and 22q11.2 deletion syndrome. Furthermore, several of the common CNVs seen in our cohort (≥ 5%) overlap genes previously associated with developmental disorders. Conclusions These findings may help refine the phenotypic outcomes and penetrance of variations affecting genes and genomic regions previously implicated in diseases. Our study provides one of the largest datasets of CNVs from individuals of African ancestry, enabling improved clinical evaluation and disease association of CNVs observed in research and clinical studies in African populations.
BackgroundCopy number variations (CNVs) account for a substantial proportion of inter-individual genomic variation. However, a majority of genomic variation studies have focused on single-nucleotide variations (SNVs), with limited genome-wide analysis of CNVs in large cohorts, especially in populations that are under-represented in genetic studies including people of African descent.ResultsIn this study, we carried out a genome-wide analysis in > 3400 healthy Bantu Africans from Tanzania using high density (> 2.5 million probes) genotyping arrays. We identified over 400000 CNVs larger than 1 kilobase (kb), for an average of 120 CNVs (SE = 2.57) per individual. We detected 866 large CNVs (≥ 300 kb), some of which overlapped genomic regions previously associated with multiple congenital anomaly syndromes, including Prader-Willi/Angelman syndrome (Type1) and 22q11.2 deletion syndrome. Furthermore, several of the common CNVs seen in our cohort (≥ 5%) overlap genes previously associated with developmental disorders.ConclusionThese findings may help refine the phenotypic outcomes and penetrance of variations affecting genes and genomic regions previously implicated in diseases. Our study provides one of the largest datasets of CNVs from individuals of African ancestry, enabling improved clinical evaluation and disease association of CNVs observed in research and clinical studies in African populations.
Similarity in facial characteristics between relatives suggests a strong genetic component underlies facial variation. While there have been numerous studies of the genetics of facial abnormalities and, more recently, single nucleotide polymorphism (SNP) genome-wide association studies (GWASs) of normal facial variation, little is known about the role of genetic structural variation in determining facial shape. In a sample of Bantu African children, we found that only 9% of common copy number variants (CNVs) and 10-kb CNV analysis windows are well tagged by SNPs (r 2 ≥ 0.8), indicating that associations with our internally called CNVs were not captured by previous SNP-based GWASs. Here, we present a GWAS and gene set analysis of the relationship between normal facial variation and CNVs in a sample of Bantu African children. We report the top five regions, which had p values ≤ 9.35 × 10 −6 and find nominal evidence of independent CNV association (p < 0.05) in three regions previously identified in SNP-based GWASs. The CNV region with strongest association (p = 1.16 × 10 −6 , 55 losses and seven gains) contains NFATC1 , which has been linked to facial morphogenesis and Cherubism, a syndrome involving abnormal lower facial development. Genomic loss in the region is associated with smaller average lower facial depth. Importantly, new loci identified here were not identified in a SNP-based GWAS, suggesting that CNVs are likely involved in determining facial shape variation. Given the plethora of SNP-based GWASs, calling CNVs from existing data may be a relatively inexpensive way to aid in the study of complex traits.
Objectives Maternal nutrition can alter the offspring epigenome at birth. We sought to examine epigenome-wide DNA methylation (DNAme) from a subset of Guatemalan mother-infant dyads from the Women First Preconception Maternal Nutrition Trial (WF). Women were randomized to either: Arm 1) women consumed a daily maternal nutrition supplement (MNS) ≥ 3 months prior to conception until delivery; Arm 2) women consumed the same MNS starting at 12 weeks gestation until delivery; or Arm 3) no MNS. We tested if infant DNAme from amnion tissue at birth (N = 99) was associated with: 1) timing of exposure to maternal MNS; 2) pre-pregnancy body mass index (ppBMI); and 3) the interaction of maternal MNS and ppBMI. Methods Bisulfite-converted DNAme libraries were constructed using Roche NimbleGen SeqCap Epi CpGiant probes and were sequenced via 2 × 150 paired end reads. We assessed the relationship between Arm, ppBMI, and Arm x ppBMI interaction on CpG methylation. All statistical models adjusted for multiple testing using false discovery rate (FDR) and controlled for maternal age, infant sex, exposure to smoke, infant genetics, and cellular heterogeneity. Gene set enrichment analyses were performed via Enrichr. Results We identified 480 CpGs associated with Arm, 4 CpGs associated with ppBMI, and 22 CpGs associated with the interaction of Arm x ppBMI (FDR < 0.05). Further, we found that DNAme was changed between Arms (1 vs 2, 1 vs 3). There were 300 CpGs that were different between Arms 1 and 2 and 159 CpGs that were different between Arms 1 and 3 that annotated to genes and passed FDR < 0.05. These results suggest preconception consumption of maternal MNS elicits different epigenetic responses as compared to MNS commencing during gestation or not at all. In addition, CpGs that annotated to genes were enriched in pathways associated with growth, development, and metabolism that included circadian rhythm, TCA cycle, Wnt signaling, and melatonin metabolism. Conclusions Our findings indicate that maternal MNS was robustly associated with amnion DNAme at birth. More specifically, preconception MNS resulted in DNAme changes that differed from the other Arms in biologically relevant pathways suggesting timing of maternal nutrition impacts the fetal epigenome. Future studies will examine DNAme associated with birth outcomes. Funding Sources Bill & Melinda Gates Foundation and NIH NICHD/ODS.
Publicly available genetic summary data have high utility in research and the clinic including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. While several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies from summary data. Using continental reference ancestry, African (AFR), Non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v2.1 exome and genome groups and subgroups finding heterogeneous continental ancestry for several groups including African/African American (~84% AFR, ~14% EUR) and American/Latinx (~4% AFR, ~5% EAS, ~43% EUR, ~46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.