BackgroundHarnessing vast amounts of genomic data in phylogenetic context stemming from massive sequencing of multiple closely related genomes requires new tools and approaches. We present a tool for the genome-wide analysis of frequencies and patterns of amino acid substitutions in multiple alignments of genes’ coding regions, and a database of amino acid substitutions in the phylogeny of 12 Drosophila genomes. We illustrate the use of these resources to address three types of evolutionary genomics questions: about fluxes in amino acid composition in proteins, about asymmetries in amino acid substitutions and about patterns of molecular evolution in duplicated genes.ResultsWe demonstrate that amino acid composition of Drosophila proteins underwent a significant shift over the last 70 million years encompassed by the studied phylogeny, with less common amino acids (Cys, Met, His) increasing in frequency and more common ones (Ala, Leu, Glu) becoming less frequent. These fluxes are strongly correlated with polarity of source and destination amino acids, resulting in overall systematic decrease of mean polarity of amino acids found in Drosophila proteins. Frequency and radicality of amino acid substitutions are higher in paralogs than in orthologous single-copy genes and are higher in gene families with paralogs than in gene families without surviving duplications. Rate and radicality of substitutions, as expected, are negatively correlated with overall level and uniformity of gene expression. However, these correlations are not observed for substitutions occurring in duplicated genes, indicating a different selective constraint on the evolution of paralogous sequences. Clades resulting from duplications show a marked asymmetry in rate and radicality of amino acid substitutions, possibly a signal of widespread neofunctionalization. These patterns differ among protein families of different functionality, with genes coding for RNA-binding proteins differing from most other functional groups in terms of amino acid substitution patterns in duplicated and single-copy genes.ConclusionsWe demonstrate that deep phylogenetic analysis of amino acid substitutions can reveal interesting genome-wide patterns. Amino acid composition of drosophilid proteins is shaped by fluxes similar to those previously observed in prokaryotic, yeast and mammalian genomes, indicating globally present patterns. Increased frequency and radicality of amino acid substitutions in duplicated genes and the presence of asymmetry of these parameters between paralogous clades indicate widespread neofunctionalization among paralogs as the mechanism of duplication retention.
BackgroundDuplicated genes can indefinately persist in genomes if either both copies retain the original function due to dosage benefit (gene conservation), or one of the copies assumes a novel function (neofunctionalization), or both copies become required to perform the function previously accomplished by a single copy (subfunctionalization), or through a combination of these mechanisms. Different models of duplication retention imply different predictions about substitution rates in the coding portion of paralogs and about asymmetry of these rates.ResultsWe analyse sequence evolution asymmetry in paralogs present in 12 Drosophila genomes using the nearest non-duplicated orthologous outgroup as a reference. Those paralogs present in D. melanogaster are analysed in conjunction with the asymmetry of expression rate and ubiquity and of segregating non-synonymous polymorphisms in the same paralogs. Paralogs accumulate substitutions, on average, faster than their nearest singleton orthologs. The distribution of paralogs’ substitution rate asymmetry is overdispersed relative to that of orthologous clades, containing disproportionally more unusually symmetric and unusually asymmetric clades. We show that paralogs are more asymmetric in: a) clades orthologous to highly constrained singleton genes; b) genes with high expression level; c) genes with ubiquitous expression and d) non-tandem duplications. We further demonstrate that, in each asymmetrically evolving pair of paralogs, the faster evolving member of the pair tends to have lower average expression rate, lower expression uniformity and higher frequency of non-synonymous SNPs than its slower evolving counterpart.ConclusionsOur findings are consistent with the hypothesis that many duplications in Drosophila are retained despite stabilising selection being more relaxed in one of the paralogs than in the other, suggesting a widespread unfinished pseudogenization. This phenomenon is likely to make detection of neo- and subfunctionalization signatures difficult, as these models of duplication retention also predict asymmetries in substitution rates and expression profiles.ReviewersThis article has been reviewed by Dr. Jia Zeng (nominated by Dr. I. King Jordan), Dr. Fyodor Kondrashov and Dr. Yuri Wolf.
Amino acid frequencies in proteins may not be at equilibrium. We consider two possible explanations for the nonzero net residue fluxes in drosophilid proteins. First, protein interiors may have a suboptimal residue composition and be under a selective pressure favoring stability, that is, leading to the loss of polar (and the gain of large) amino acids. One would then expect stronger net fluxes on the protein interior than at the exposed sites. Alternatively, if most of the polarity loss occurs at the exposed sites and the selective constraint on amino acid composition at such sites decreases over time, net loss of polarity may be neutral and caused by disproportionally high occurrence of polar residues at exposed, least constrained sites. We estimated net evolutionary fluxes of residue polarity and volume at sites with different solvent accessibility in conserved protein families from 12 species of Drosophila. Net loss of polarity, miniscule in magnitude, but consistent across all lineages, occurred at all sites except the most exposed ones, where net flux of polarity was close to zero or, in membrane proteins, even positive. At the intermediate solvent accessibility the net fluxes of polarity and volume were similar to neutral predictions, whereas much of the polarity loss not attributable to neutral expectations occurred at the buried sites. These observations are consistent with the hypothesis that residue composition in many proteins is structurally suboptimal and continues to evolve toward lower polarity in the protein interior, in particular in proteins with intracellular localization. The magnitude of polarity and volume changes was independent from the protein’s evolutionary age, indicating that the approach to equilibrium has been slow or that no such single equilibrium exists.
Extreme disease phenotypes have the potential to provide key pathophysiologic insights, but the study of these conditions is challenging due to their rarity and the limited statistical power of existing methods. Herein, we apply a novel pathway-based approach to investigate the role of rare genomic variants in infectious purpura fulminans (PF), an extreme phenotype of sepsis with hyperinflammation and coagulopathy for which the role of inherited risk factors is currently unknown. Using whole exome sequencing, we found a significantly increased burden of rare, putatively function-altering coding variants in the complement system in patients with PF compared to unselected patients with sepsis (p-value = 0.01). Functional characterization of a subset of PF-associated variants in integrin complement receptors 3 and 4 (CR3 and CR4) revealed that they exhibit a pro-inflammatory phenotype. Our results suggest that rare inherited defects in the complement system predispose individuals to the maladaptive hyperinflammatory response that characterizes severe sepsis.
Background: Purpura fulminans (PF) is a rare but devastating complication of sepsis in which patients experience overwhelming systemic thrombosis leading to end organ damage, distal limb ischemia, and death. Survivors are often left with lifelong disfigurement and/or disability due to severe acral tissue injury and the need for amputation of necrotic extremities. Given that most patients who develop PF have no known underlying medical conditions and are much younger than those typically afflicted with sepsis, we sought to investigate whether genetic determinants predispose individuals to developing PF. Methods: We performed whole exome sequencing on the Boston PF Cohort (N=40). We used pathway-based collapsing analysis (PBCA) combined with mutational burden testing to evaluate for enrichment of rare coding variants in the complement system using as a comparator unselected patients with sepsis from the NHLBI ARDSnet iSPAAR cohort (N=87). Several novel variants in integrin complement receptors were identified and subsequently cloned and functionally characterized. Results: Patients in the Boston PF Cohort were relatively young at the time of presentation, with a median (IQR) age of 37.5 (20.3-58.8) years. Patients were afflicted with gram negative organisms 45% of the time, while 27.5% were infected with gram positive organisms. In another 27.5%, no pathogen was identified. PF patients had strikingly abnormal markers of severe sepsis and coagulopathy, including the following median (IQR) values: lactate 7.1 (5.3-11.8) mmol/L, platelet count 28,000 (15,000-48,500) per μl, aPTT 68.0 (48.3-142.5) seconds, INR 2.8 (1.9-4.0), and protein C activity 26.0% (13.5-38.5). No patient had a known congenital immune defect. Whole exome sequencing (mean read depth 80X) identified 30 unique heterozygous complement system variants in 26/40 (65%) patients with PF (Figure 1A-B). Variants in this pathway were highly enriched in PF patients compared to the control cohort (P<0.001, Figure 1C), and the signal remained significant (P=0.02) in a sensitivity analysis comparing only Caucasian patients. By contrast, no enrichment of variants was found in the coagulation system (P=0.51, Figure 1D), and no difference in complement system variants was detected between the two cohorts when only synonymous variants were considered. The number of complement system variants per patient correlated positively with their Sequential Organ Failure Assessment (SOFA) score at the time of presentation (P=0.03) and independently predicted presenting SOFA score in a multivariate regression model (P=0.04). Eleven unique variants were identified in ITGAM, ITGAX, and ITGB2, genes encoding the integrin complement receptors 3 and 4 (CR3 and CR4), which have anti- and pro-inflammatory properties, respectively. Because little is known about the function of these receptors in sepsis, we performed additional in vitro studies and found that 6/8 (75%) CR3 variants resulted in partial or complete loss-of-function, while 3/7 (42.9%) CR4 variants unexpectedly resulted in gain-of-function. Taken together, these results suggest that the immune response to microbial infection in PF patients results in enhanced pro-inflammatory signaling by CR3 and CR4. Conclusions: Our data suggest that rare inherited defects in the complement system predispose individuals to the maladaptive thrombo-inflammatory response to infection that characterizes PF. By advancing our understanding of the molecular pathogenesis of PF, this work could provide important insights into the broader problem of coagulopathy in sepsis. Furthermore, targeted gene discovery approaches based on PBCA may serve as a model for future genomic studies of rare hemostatic diseases. Disclosures No relevant conflicts of interest to declare.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.