Several decades of research have convincingly shown that classical human leukocyte antigen (HLA) loci bear signatures of natural selection. Despite this conclusion, many questions remain regarding the type of selective regime acting on these loci, the time frame at which selection acts, and the functional connections between genetic variability and natural selection. In this review, we argue that genomic datasets, in particular those generated by next-generation sequencing (NGS) at the population scale, are transforming our understanding of HLA evolution. We show that genomewide data can be used to perform robust and powerful tests for selection, capable of identifying both positive and balancing selection at HLA genes. Importantly, these tests have shown that natural selection can be identified at both recent and ancient timescales. We discuss how findings from genomewide association studies impact the evolutionary study of HLA genes, and how genomic data can be used to survey adaptive change involving interaction at multiple loci. We discuss the methodological developments which are necessary to correctly interpret genomic analyses involving the HLA region. These developments include adapting the NGS analysis framework so as to deal with the highly polymorphic HLA data, as well as developing tools and theory to search for signatures of selection, quantify differentiation, and measure admixture within the HLA region. Finally, we show that high throughput analysis of molecular phenotypes for HLA genes—namely transcription levels—is now a feasible approach and can add another dimension to the study of genetic variation.
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than ±0.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.
When humans moved from Asia toward the Americas over 18,000 y ago and eventually peopled the New World they encountered a new environment with extreme climate conditions and distinct dietary resources. These environmental and dietary pressures may have led to instances of genetic adaptation with the potential to influence the phenotypic variation in extant Native American populations. An example of such an event is the evolution of the fatty acid desaturases (FADS) genes, which have been claimed to harbor signals of positive selection in Inuit populations due to adaptation to the cold Greenland Arctic climate and to a protein-rich diet. Because there was evidence of intercontinental variation in this genetic region, with indications of positive selection for its variants, we decided to compare the Inuit findings with other Native American data. Here, we use several lines of evidence to show that the signal of FADS-positive selection is not restricted to the Arctic but instead is broadly observed throughout the Americas. The shared signature of selection among populations living in such a diverse range of environments is likely due to a single and strong instance of local adaptation that took place in the common ancestral population before their entrance into the New World. These first Americans peopled the whole continent and spread this adaptive variant across a diverse set of environments.
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than 60.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity. KEYWORDS
The origin of syphilis is still controversial. Different research avenues explore its fascinating history. Here we employed a new integrative approach, where paleopathology and molecular analyses are combined. As an exercise to test the validity of this approach we examined different hypotheses on the origin of syphilis and other human diseases caused by treponemes (treponematoses). Initially, we constructed a worldwide map containing all accessible reports on palaeopathological evidences of treponematoses before Columbus's return to Europe. Then, we selected the oldest ones to calibrate the time of the most recent common ancestor of Treponema pallidum subsp. pallidum, T. pallidum subsp. endemicum and T. pallidum subsp. pertenue in phylogenetic analyses with 21 genetic regions of different T. pallidum strains previously reported. Finally, we estimated the treponemes' evolutionary rate to test three scenarios: A) if treponematoses accompanied human evolution since Homo erectus; B) if venereal syphilis arose very recently from less virulent strains caught in the New World about 500 years ago, and C) if it emerged in the Americas between 16,500 and 5,000 years ago. Two of the resulting evolutionary rates were unlikely and do not explain the existent osseous evidence. Thus, treponematoses, as we know them today, did not emerge with H. erectus, nor did venereal syphilis appear only five centuries ago. However, considering 16,500 years before present (yBP) as the time of the first colonization of the Americas, and approximately 5,000 yBP as the oldest probable evidence of venereal syphilis in the world, we could not entirely reject hypothesis C. We confirm that syphilis seems to have emerged in this time span, since the resulting evolutionary rate is compatible with those observed in other bacteria. In contrast, if the claims of precolumbian venereal syphilis outside the Americas are taken into account, the place of origin remains unsolved. Finally, the endeavor of joining paleopathology and phylogenetics proved to be a fruitful and promising approach for the study of infectious diseases.
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.
Biological networks pervade nature. They describe systems throughout all levels of biological organization, from molecules regulating metabolism to species interactions that shape ecosystem dynamics. The network thinking revealed recurrent organizational patterns in complex biological systems, such as the formation of semi-independent groups of connected elements (modularity) and non-random distributions of interactions among elements. Other structural patterns, such as nestedness, have been primarily assessed in ecological networks formed by two non-overlapping sets of elements; information on its occurrence on other levels of organization is lacking. Nestedness occurs when interactions of less connected elements form proper subsets of the interactions of more connected elements. Only recently these properties began to be appreciated in one-mode networks (where all elements can interact) which describe a much wider variety of biological phenomena. Here, we compute nestedness in a diverse collection of one-mode networked systems from six different levels of biological organization depicting gene and protein interactions, complex phenotypes, animal societies, metapopulations, food webs and vertebrate metacommunities. Our findings suggest that nestedness emerge independently of interaction type or biological scale and reveal that disparate systems can share nested organization features characterized by inclusive subsets of interacting elements with decreasing connectedness. We primarily explore the implications of a nested structure for each of these studied systems, then theorize on how nested networks are assembled. We hypothesize that nestedness emerges across scales due to processes that, although system-dependent, may share a general compromise between two features: specificity (the number of interactions the elements of the system can have) and affinity (how these elements can be connected to each other). Our findings suggesting occurrence of nestedness throughout biological scales can stimulate the debate on how pervasive nestedness may be in nature, while the theoretical emergent principles can aid further research on commonalities of biological networks.
Despite the high number of individuals infected by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) who develop coronavirus disease 2019 (COVID-19) symptoms worldwide, many exposed individuals remain asymptomatic and/or uninfected and seronegative. This could be explained by a combination of environmental (exposure), immunological (previous infection), epigenetic, and genetic factors. Aiming to identify genetic factors involved in immune response in symptomatic COVID-19 as compared to asymptomatic exposed individuals, we analyzed 83 Brazilian couples where one individual was infected and symptomatic while the partner remained asymptomatic and serum-negative for at least 6 months despite sharing the same bedroom during the infection. We refer to these as “discordant couples”. We performed whole-exome sequencing followed by a state-of-the-art method to call genotypes and haplotypes across the highly polymorphic major histocompatibility complex (MHC) region. The discordant partners had comparable ages and genetic ancestry, but women were overrepresented (65%) in the asymptomatic group. In the antigen-presentation pathway, we observed an association between HLA-DRB1 alleles encoding Lys at residue 71 (mostly DRB1*03:01 and DRB1*04:01) and DOB*01:02 with symptomatic infections and HLA-A alleles encoding 144Q/151R with asymptomatic seronegative women. Among the genes related to immune modulation, we detected variants in MICA and MICB associated with symptomatic infections. These variants are related to higher expression of soluble MICA and low expression of MICB. Thus, quantitative differences in these molecules that modulate natural killer (NK) activity could contribute to susceptibility to COVID-19 by downregulating NK cell cytotoxic activity in infected individuals but not in the asymptomatic partners.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.