The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
SUMMARY Small RNAs target invaders for silencing in the CRISPR-Cas pathways that protect bacteria and archaea from viruses and plasmids. The CRISPR RNAs (crRNAs) contain sequence elements acquired from invaders that guide CRISPR-associated (Cas) proteins back to the complementary invading DNA or RNA. Here, we have analyzed essential features of the crRNAs associated with the Cas RAMP module (Cmr) effector complex, which cleaves targeted RNAs. We show that Cmr crRNAs contain an 8-nucleotide 5’ sequence tag (also found on crRNAs associated with other CRISPR-Cas pathways) that is critical for crRNA function and can be used to engineer crRNAs that direct cleavage of novel targets. We also present data that indicates that the Cmr complex cleaves an endogenous complementary RNA in Pyrococcus furiosus, providing direct in vivo evidence of RNA targeting by the CRISPR-Cas system. Our findings indicate that the CRISPR RNA-Cmr protein pathway may be exploited to cleave RNAs of interest.
BackgroundStructural rearrangements of the genome resulting in genic imbalance due to copy number change are often deleterious at the organismal level, but are common in immortalized cell lines and tumors, where they may be an advantage to cells. In order to explore the biological consequences of copy number changes in the Drosophila genome, we resequenced the genomes of 19 tissue-culture cell lines and generated RNA-Seq profiles.ResultsOur work revealed dramatic duplications and deletions in all cell lines. We found three lines of evidence indicating that copy number changes were due to selection during tissue culture. First, we found that copy numbers correlated to maintain stoichiometric balance in protein complexes and biochemical pathways, consistent with the gene balance hypothesis. Second, while most copy number changes were cell line-specific, we identified some copy number changes shared by many of the independent cell lines. These included dramatic recurrence of increased copy number of the PDGF/VEGF receptor, which is also over-expressed in many cancer cells, and of bantam, an anti-apoptosis miRNA. Third, even when copy number changes seemed distinct between lines, there was strong evidence that they supported a common phenotypic outcome. For example, we found that proto-oncogenes were over-represented in one cell line (S2-DRSC), whereas tumor suppressor genes were under-represented in another (Kc167).ConclusionOur study illustrates how genome structure changes may contribute to selection of cell lines in vitro. This has implications for other cell-level natural selection progressions, including tumorigenesis.Electronic supplementary materialThe online version of this article (doi:10.1186/gb-2014-15-8-r70) contains supplementary material, which is available to authorized users.
X-chromosome inactivation (XCI), the dosage compensation process that equalizes X-linked gene expression between sexes, has mostly been studied in the mouse, where the central role for the non-coding RNA Xist in the initiation and spreading of the process was demonstrated. Although Xist is conserved in humans [1], very little is known concerning its regulation and function in this species. Several lines of evidence moreover suggest that different strategies have been adopted in the human to control XCI as compared to the mouse. In particular, in human pre-implantation development, XIST RNA coats the X chromosome(s) in both male and female embryos without inducing X-chromosome silencing [2]. This indicates that XIST expression and X-inactivation can be uncoupled during human embryogenesis and that other elements likely participate to the control of X chromosome activity in humans.XCI is established early during embryonic development, and embryonic stem cells can be used to decipher the kinetics and the molecular actors of the process. Human female embryonic stem cells (hESC) can be found in different configurations regarding XIST expression: most female hESC have already undergone XCI but tend to spontaneously lose XIST expression [3]. In the course of an RNA-seq analysis of female hESC, we identified an extended and un-annotated transcribed region producing a long unspliced, likely non-coding nuclear RNA. RNA-FISH analysis reveals that this transcript is expressed from, and coats the active X chromosome. We called this transcript XACT, for X-active coating transcript. In female hESC in which XIST is repressed, XACT is expressed from and coats both Xs, and this correlates with significant reactivation of the inactive X chromosome. Expression of XACT appears to be specific for pluripotent cells as its expression decreases during differentiation. Finally, we provide evidence that XACT is not conserved in the mouse.In conclusion, we have identified XACT as the first long ncRNA that coats the active X chromosome in human. Given its expression profile and lack of conservation, it is tempting to speculate that XACT is involved in the peculiar control of XCI initiation in human.
We have constructed a database of alternatively spliced protein forms (ASP), consisting of 13,384 protein isoform sequences of 4422 human genes (www.bioinformatics.ucla.edu/ASP). We identified fifty protein domain types that were selectively removed by alternative splicing at much higher frequencies than average (p-value < 0.01). These include many well-known protein-interaction domains (e.g., KRAB; ankyrin repeats; Kelch) including some that have been previously shown to be regulated functionally by alternative splicing (e.g., collagen domain). We present a number of novel examples (Kruppel transcription factors; Pbx2; Enc1) from the ASP database, illustrating how this pattern of alternative splicing changes the structure of a biological pathway, by redirecting protein interaction networks at key switch points. Our bioinformatics analysis indicates that a major impact of alternative splicing is removal of protein-protein interaction domains that mediate key linkages in protein interaction networks. ASP expands the available dataset of human alternatively spliced protein forms from 1989 human genes (SwissProt release 42) to 5413 (nonredundant set, ASP + SwissProt), a nearly 3-fold increase. ASP will enhance the existing pool of protein sequences that are searched by mass spectroscopy software during the identification of peptide fragments.
Recently there has been much interest in assessing the role of alternative splicing in evolution. We have sought to measure functional selection pressure on alternatively spliced single-exon skips, by calculating the fraction that are an exact multiple of 3 nt in length and therefore preserve protein reading-frame in both the exon-inclusion and exon-skip splice forms. The frame-preservation ratio (defined as the number of exons that are an exact multiple of three in length, divided by the number of exons that are not) was slightly above random for both constitutive exons and alternatively spliced exons as a whole in human and mouse. However, orthologous exons that were observed to be alternatively spliced in the expressed sequence tag data from two or more organisms showed a substantially increased bias to be frame-preserving. This effect held true only for exons within the protein coding region, and not the untranslated region. In five animal genomes (human, mouse, rat, zebrafish, Drosophila), we observed an association between these conserved alternative splicing events and increased selection pressure for frame-preservation. Surprisingly, this effect became stronger as a function of decreasing exon inclusion level: for alternatively spliced exons that were included in a majority of the gene's transcripts, the frame-preservation bias was no higher than that of constitutive exons, whereas for alternatively spliced exons that were included in only a minority of the gene's transcripts, the frame-preservation bias increased nearly 20-fold. These data indicate that a subpopulation of modern alternative splicing events was present in the common ancestors of these genomes, and was under functional selection pressure to preserve the protein reading frame.
Evolution of protein sequences is largely governed by purifying selection, with a small fraction of proteins evolving under positive selection. The evolution at synonymous positions in protein-coding genes is not nearly as well understood, with the extent and types of selection remaining, largely, unclear. A statistical test to identify purifying and positive selection at synonymous sites in protein-coding genes was developed. The method compares the rate of evolution at synonymous sites (Ks) to that in intron sequences of the same gene after sampling the aligned intron sequences to mimic the statistical properties of coding sequences. We detected purifying selection at synonymous sites in approximately 28% of the 1,562 analyzed orthologous genes from mouse and rat, and positive selection in approximately 12% of the genes. Thus, the fraction of genes with readily detectable positive selection at synonymous sites is much greater than the fraction of genes with comparable positive selection at nonsynonymous sites, i.e., at the level of the protein sequence. Unlike other genes, the genes with positive selection at synonymous sites showed no correlation between Ks and the rate of evolution in nonsynonymous sites (Ka), indicating that evolution of synonymous sites under positive selection is decoupled from protein evolution. The genes with purifying selection at synonymous sites showed significant anticorrelation between Ks and expression level and breadth, indicating that highly expressed genes evolve slowly. The genes with positive selection at synonymous sites showed the opposite trend, i.e., highly expressed genes had, on average, higher Ks. For the genes with positive selection at synonymous sites, a significantly lower mRNA stability is predicted compared to the genes with negative selection. Thus, mRNA destabilization could be an important factor driving positive selection in nonsynonymous sites, probably, through regulation of expression at the level of mRNA degradation and, possibly, also translation rate. So, unexpectedly, we found that positive selection at synonymous sites of mammalian genes is substantially more common than positive selection at the level of protein sequences. Positive selection at synonymous sites might act through mRNA destabilization affecting mRNA levels and translation.
The selection of a DNA barcode in plants has been impeded in part due to the relatively low rates of nucleotide substitution observed at the most accessible plastid markers. However, the absence of consensus also reflects a lack of standards for comparing potential barcode markers. While many publications have suggested a host of plant DNA barcodes, the studies cannot be readily compared with each other through any quantitative or statistical parameter, partly because they put forward no single compelling rationale relevant to the adoption of a DNA barcode in plants. Here, we argue that the efficacy of any particular plant DNA barcode selection should reflect the anticipated performance of the resulting barcode database in assignment of a query sequence to species. While legitimate scientific disagreement exists over the criteria relevant to “database performance”, the notion gives a unifying rationale for prioritizing selection criteria. Accordingly, we suggest a measure of barcode efficacy based on the rationale of database performance, “the probability of correct identification” (PCI). Moreover, the definition of PCI is left flexible enough to handle most of the scientific disagreement over how to best evaluate DNA barcodes. Finally, we consider how different types of barcodes might require different methods of analysis and database design and indicate how the analysis might affect the selection of the most broadly effective barcode for land plants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.