Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments.
Bacterial pathogenicity islands (PAI) often encode both effector molecules responsible for disease and secretion systems that deliver these effectors to host cells. Human enterohemorrhagic Escherichia coli (EHEC), enteropathogenic E. coli, and the mouse pathogen Citrobacter rodentium (CR) possess the locus of enterocyte effacement (LEE) PAI. We systematically mutagenized all 41 CR LEE genes and functionally characterized these mutants in vitro and in a murine infection model. We identified 33 virulence factors, including two virulence regulators and a hierarchical switch for type III secretion. In addition, 7 potential type III effectors encoded outside the LEE were identified by using a proteomics approach. These non-LEE effectors are encoded by three uncharacterized PAIs in EHEC O157, suggesting that these PAIs act cooperatively with the LEE in pathogenesis. Our findings provide significant insights into bacterial virulence mechanisms and disease. D iarrheagenic enterohemorrhagic Escherichia coli (EHEC), enteropathogenic E. coli (EPEC), and Citrobacter rodentium (CR) are attaching͞effacing (A͞E) bacterial pathogens that attach to host intestinal epithelium and efface brush border microvilli, forming A͞E lesions (1, 2). EHEC and EPEC represent a significant threat to human health. Sequencing the genome of EHEC O157:H7, the causative agent of ''Hamburger disease'' and the most common serotype associated with food and water poisoning, has identified many putative virulence factors (3). These factors are often encoded by pathogenicity islands (PAI) present in the genomes of pathogenic, but not closely related nonpathogenic, strains (4). However, the functions of the PAIs in virulence have not been systematically analyzed.Many key virulence factors shared by A͞E pathogens reside in the locus of enterocyte effacement (LEE), a PAI essential for A͞E lesion formation (5-8). The LEE contains 41 genes and encodes a type III secretion system (TTSS), a common virulence mechanism for many human and plant pathogens (4, 9, 10). TTSSs are conserved organelles that deliver bacterial effector proteins capable of modulating host functions into host cells. The LEE encodes proteins for forming such an organelle (2), but the LEE genes involved in assembling and regulating this apparatus have not been defined.The LEE also encodes a regulator (Ler), an adhesin (intimin) and its receptor (Tir) responsible for intimate attachment, several secreted proteins, and their chaperones (1, 2). The secreted proteins consist of effectors as well as translocators (EspA, EspD, and EspB) required for translocating effectors into host cells. Five LEEencoded effectors (Tir, EspG, EspF, Map, and EspH) have been identified, which are involved in modulating host cytoskeleton (2, 11). However, nearly half of the LEE genes have no homologs and have not been functionally studied.Because EHEC and EPEC are human pathogens, efforts aimed at elucidating the function of the LEE have primarily been restricted to in vitro studies. Animal models, including neonatal...
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
Summary EnterohaemorrhagicEscherichia coli (EHEC) O157:H7 uses a specialized protein translocation apparatus, the type III secretion system (TTSS), to deliver bacterial effector proteins into host cells. These effectors interfere with host cytoskeletal pathways and signalling cascades to facilitate bacterial survival and replication and promote disease. The genes encoding the TTSS and all known type III secreted effectors in EHEC are localized in a single pathogenicity island on the bacterial chromosome known as the locus for enterocyte effacement (LEE). In this study, we performed a proteomic analysis of proteins secreted by the LEE-encoded TTSS of EHEC. In addition to known LEE-encoded type III secreted proteins, such as EspA, EspB and Tir, a novel protein, NleA (non-LEE-encoded effector A), was identified. NleA is encoded in a prophage-associated pathogenicity island within the EHEC genome, distinct from the LEE. The LEE-encoded TTSS directs translocation of NleA into host cells, where it localizes to the Golgi apparatus. In a panel of strains examined by Southern blot and database analyses, nleA was found to be present in all other LEE-containing pathogens examined, including enteropathogenic E. coli and Citrobacter rodentium , and was absent from nonpathogenic strains of E. coli and non-LEE-containing pathogens. NleA was determined to play a key role in virulence of C. rodentium in a mouse infection model.
Tumors of distinct tissues of origin and genetic makeup display common hallmark cellular phenotypes, including sustained proliferation, suppression of cell death, and altered metabolism. These phenotypic commonalities have been proposed to stem from disruption of conserved regulatory mechanisms evolved during the transition to multicellularity to control fundamental cellular processes such as growth and replication. Dating the evolutionary emergence of human genes through phylostratigraphy uncovered close association between gene age and expression level in RNA sequencing data from The Cancer Genome Atlas for seven solid cancers. Genes conserved with unicellular organisms were strongly up-regulated, whereas genes of metazoan origin were primarily inactivated. These patterns were most consistent for processes known to be important in cancer, implicating both selection and active regulation during malignant transformation. The coordinated expression of strongly interacting multicellularity and unicellularity processes was lost in tumors. This separation of unicellular and multicellular functions appeared to be mediated by 12 highly connected genes, marking them as important general drivers of tumorigenesis. Our findings suggest common principles closely tied to the evolutionary history of genes underlie convergent changes at the cellular process level across a range of solid cancers. We propose altered activity of genes at the interfaces between multicellular and unicellular regions of human gene regulatory networks activate primitive transcriptional programs, driving common hallmark features of cancer. Manipulation of cross-talk between biological processes of different evolutionary origins may thus present powerful and broadly applicable treatment strategies for cancer.
The first sequenced marsupial genome promises to reveal unparalleled insights into mammalian evolution. We have used theMonodelphis domestica (gray short-tailed opossum) sequence to construct the first map of a marsupial major histocompatibility complex (MHC). The MHC is the most gene-dense region of the mammalian genome and is critical to immunity and reproductive success. The marsupial MHC bridges the phylogenetic gap between the complex MHC of eutherian mammals and the minimal essential MHC of birds. Here we show that the opossum MHC is gene dense and complex, as in humans, but shares more organizational features with non-mammals. The Class I genes have amplified within the Class II region, resulting in a unique Class I/II region. We present a model of the organization of the MHC in ancestral mammals and its elaboration during mammalian evolution. The opossum genome, together with other extant genomes, reveals the existence of an ancestral “immune supercomplex” that contained genes of both types of natural killer receptors together with antigen processing genes and MHC genes.
Non-genetic drug resistance is increasingly recognised in various cancers. Molecular insights into this process are lacking and it is unknown whether stable non-genetic resistance can be overcome. Using single cell RNA-sequencing of paired drug naïve and resistant AML patient samples and cellular barcoding in a unique mouse model of non-genetic resistance, here we demonstrate that transcriptional plasticity drives stable epigenetic resistance. With a CRISPR-Cas9 screen we identify regulators of enhancer function as important modulators of the resistant cell state. We show that inhibition of Lsd1 (Kdm1a) is able to overcome stable epigenetic resistance by facilitating the binding of the pioneer factor, Pu.1 and cofactor, Irf8, to nucleate new enhancers that regulate the expression of key survival genes. This enhancer switching results in the re-distribution of transcriptional co-activators, including Brd4, and provides the opportunity to disable their activity and overcome epigenetic resistance. Together these findings highlight key principles to help counteract non-genetic drug resistance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.