As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enabled identifying ~2,000 novel mobile element insertions, nearly 5Mb of genomic segments absent from human genome reference, and over 140 novel alleles from HLA genes. We reclassified and curated nearly four hundred variant's pathogenicity assertions in genes associated with dominantly inherited Mendelian disorders and calculated the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observed that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~ 2 million are absent from large public databases. WGS enabled identifying ~ 2,000 novel mobile element insertions, nearly 5 Mb of genomic segments absent from human genome reference, and over 140 novel alleles from HLA genes. We reclassified and curated nearly four hundred variant's pathogenicity assertions in genes associated with dominantly inherited Mendelian disorders and calculated the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observed that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.
The original version of this Article contained an error in the title, which was previously incorrectly given as 'Whole-genome sequencing of 1,171 elderly admixed individuals from São Paulo, Brazil'. The correct version removes the word "São Paulo". This has been corrected in both the PDF and HTML versions of the Article.
HLA‐B is among the most variable gene in the human genome. This gene encodes a key molecule for antigen presentation to CD8+ T lymphocytes and NK cell modulation. Despite the myriad of studies evaluating its coding region (with an emphasis on exons 2 and 3), few studies evaluated introns and regulatory sequences in real population samples. Thus, HLA‐B variability is probably underestimated. We applied a bioinformatics pipeline tailored for HLA genes on 5347 samples from 80 different populations, which includes more than 1000 admixed Brazilians, to evaluate the HLA‐B variability (SNPs, indels, MNPs, alleles, and haplotypes) in exons, introns, and regulatory regions. We observed 610 variable sites throughout HLA‐B; the most frequent variants are shared worldwide. However, the haplotype distribution is geographically structured. We detected 920 full‐length haplotypes (exons, introns, and untranslated regions) encoding 239 different protein sequences. HLA‐B gene diversity is higher in admixed populations and Europeans while lower in African ancestry individuals. Each HLA‐B allele group is associated with specific promoter sequences. This HLA‐B variation resource may improve HLA imputation accuracy and disease‐association studies and provide evolutionary insights regarding HLA‐B genetic diversity in human populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.