An unexpectedly high proportion of SNPs on the X chromosome in the 1000 Genomes Project phase 3 data were identified with significant sex differences in minor allele frequencies (sdMAF). sdMAF persisted for many of these SNPs in the recently released high coverage whole genome sequence of the 1000 Genomes Project that was aligned to GRCh38, and it was consistent between the five super-populations. Among the 245,825 common (MAF>5%) biallelic X-chromosomal SNPs in the phase 3 data presumed to be of high quality, 2,039 have genome-wide significant sdMAF (p-value <5e-8). sdMAF varied by location: non-pseudo-autosomal region (NPR) = 0.83%, pseudo-autosomal regions (PAR1) = 0.29%, PAR2 = 13.1%, and X-transposed region (XTR)/PAR3 = 0.85% of SNPs had sdMAF, and they were clustered at the NPR-PAR boundaries, among others. sdMAF at the NPR-PAR boundaries are biologically expected due to sex-linkage, but have generally been ignored in association studies. For comparison, similar analyses found only 6, 1 and 0 SNPs with significant sdMAF on chromosomes 1, 7 and 22, respectively. Similar sdMAF results for the X chromosome were obtained from the high coverage whole genome sequence data from gnomAD V 3.1.2 for both the non-Finnish European and African/African American samples. Future X chromosome analyses need to take sdMAF into account.
Ten years ago, a detailed analysis of genome-wide association studies showed that only 33% of the studies included the X chromosome. Multiple recommendations were made to combat eXclusion. Here we re-surveyed the research landscape to determine if these earlier recommendations had been translated. Unfortunately, among the summary statistics reported in 2021 in the NHGRI-EBI GWAS catalog, only 25% provided results for the X chromosome and 3% for the Y chromosome, suggesting that the eXclusion phenomenon documented earlier not only persists but has also expanded into an eXclusionarY problem. Normalizing by physical length of the chromosome, the average number of studies published until 11/29/22 with genome-wide significant findings on the X chromosome is ~1 study/Mb. In contrast, it ranges from ~6 to ~16 studies/Mb for chromosomes 4 and 19, respectively. Compared with the autosomal growth rate of ~0.086 studies/Mb/year over the last decade, studies of the X chromosome grew at less than one-seventh that rate, only ~0.012 studies/Mb/year. Among the studies that reported significant association on the X chromosome, there were extreme heterogeneities in how they analyzed the data and documented the results, suggesting the need for guidelines. Not surprisingly, among the 430 scores sampled from the PolyGenic Score catalog, 0% contained weights for sex chromosomal SNPs. To overcome the dearth of sex chromosome analyses, we provide five sets of recommendations and future directions. Finally, until the sex chromosomes are included in a whole-genome study, instead of GWAS, we propose they be more properly referred to as "AWAS" for "autosome-wide scan".
An unexpectedly high proportion of SNPs on the X chromosome in the 1000 Genomes Project phase 3 data were identified with significant sex differences in minor allele frequencies (sdMAF). sdMAF persisted for many of these SNPs in the recently released high coverage whole genome sequence, and it was consistent between the five super-populations. Among the 245,825 common biallelic SNPs in phase 3 data presumed to be high quality, 2,039 have genome-wide significant sdMAF (p-value <5e-8). sdMAF varied by location: (NPR)=0.83%, pseudo-autosomal region (PAR1)=0.29%, PAR2=13.1%, and PAR3=0.85% of SNPs had sdMAF, and they were clustered at the NPR-PAR boundaries, among others. sdMAF at the NPR-PAR boundaries are biologically expected due to sex-linkage, but have generally been ignored in association studies. For comparison, similar analyses found only 6, 1 and 0 SNPs with significant sdMAF on chromosomes 1, 7 and 22, respectively. Future X chromosome analyses need to take sdMAF into account.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.